Copyright (c) 2014-2018, Sebastian Raschka. All rights reserved.
+
Copyright (c) 2014-2019, Sebastian Raschka. All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
@@ -990,7 +998,7 @@
Under the following terms:
diff --git a/docs/_site/site/search/search_index.json b/docs/_site/site/search/search_index.json
index af1e5d378..407626cff 100644
--- a/docs/_site/site/search/search_index.json
+++ b/docs/_site/site/search/search_index.json
@@ -1 +1 @@
-{"config":{"lang":["en"],"prebuild_index":false,"separator":"[\\s\\-]+"},"docs":[{"location":"","text":"Welcome to mlxtend's documentation! Mlxtend (machine learning extensions) is a Python library of useful tools for the day-to-day data science tasks. Links Documentation: http://rasbt.github.io/mlxtend Source code repository: https://github.com/rasbt/mlxtend PyPI: https://pypi.python.org/pypi/mlxtend Questions? Check out the Google Groups mailing list Examples import numpy as np import matplotlib.pyplot as plt import matplotlib.gridspec as gridspec import itertools from sklearn.linear_model import LogisticRegression from sklearn.svm import SVC from sklearn.ensemble import RandomForestClassifier from mlxtend.classifier import EnsembleVoteClassifier from mlxtend.data import iris_data from mlxtend.plotting import plot_decision_regions # Initializing Classifiers clf1 = LogisticRegression(random_state=0) clf2 = RandomForestClassifier(random_state=0) clf3 = SVC(random_state=0, probability=True) eclf = EnsembleVoteClassifier(clfs=[clf1, clf2, clf3], weights=[2, 1, 1], voting='soft') # Loading some example data X, y = iris_data() X = X[:,[0, 2]] # Plotting Decision Regions gs = gridspec.GridSpec(2, 2) fig = plt.figure(figsize=(10, 8)) labels = ['Logistic Regression', 'Random Forest', 'RBF kernel SVM', 'Ensemble'] for clf, lab, grd in zip([clf1, clf2, clf3, eclf], labels, itertools.product([0, 1], repeat=2)): clf.fit(X, y) ax = plt.subplot(gs[grd[0], grd[1]]) fig = plot_decision_regions(X=X, y=y, clf=clf, legend=2) plt.title(lab) plt.show() If you use mlxtend as part of your workflow in a scientific publication, please consider citing the mlxtend repository with the following DOI: @article{raschkas_2018_mlxtend, author = {Sebastian Raschka}, title = {MLxtend: Providing machine learning and data science utilities and extensions to Python\u2019s scientific computing stack}, journal = {The Journal of Open Source Software}, volume = {3}, number = {24}, month = apr, year = 2018, publisher = {The Open Journal}, doi = {10.21105/joss.00638}, url = {http://joss.theoj.org/papers/10.21105/joss.00638} } License This project is released under a permissive new BSD open source license ( LICENSE-BSD3.txt ) and commercially usable. There is no warranty; not even for merchantability or fitness for a particular purpose. In addition, you may use, copy, modify and redistribute all artistic creative works (figures and images) included in this distribution under the directory according to the terms and conditions of the Creative Commons Attribution 4.0 International License. See the file LICENSE-CC-BY.txt for details. (Computer-generated graphics such as the plots produced by matplotlib fall under the BSD license mentioned above). Contact I received a lot of feedback and questions about mlxtend recently, and I thought that it would be worthwhile to set up a public communication channel. Before you write an email with a question about mlxtend, please consider posting it here since it can also be useful to others! Please join the Google Groups Mailing List ! If Google Groups is not for you, please feel free to write me an email or consider filing an issue on GitHub's issue tracker for new feature requests or bug reports. In addition, I setup a Gitter channel for live discussions.","title":"Home"},{"location":"#welcome-to-mlxtends-documentation","text":"Mlxtend (machine learning extensions) is a Python library of useful tools for the day-to-day data science tasks.","title":"Welcome to mlxtend's documentation!"},{"location":"#links","text":"Documentation: http://rasbt.github.io/mlxtend Source code repository: https://github.com/rasbt/mlxtend PyPI: https://pypi.python.org/pypi/mlxtend Questions? Check out the Google Groups mailing list","title":"Links"},{"location":"#examples","text":"import numpy as np import matplotlib.pyplot as plt import matplotlib.gridspec as gridspec import itertools from sklearn.linear_model import LogisticRegression from sklearn.svm import SVC from sklearn.ensemble import RandomForestClassifier from mlxtend.classifier import EnsembleVoteClassifier from mlxtend.data import iris_data from mlxtend.plotting import plot_decision_regions # Initializing Classifiers clf1 = LogisticRegression(random_state=0) clf2 = RandomForestClassifier(random_state=0) clf3 = SVC(random_state=0, probability=True) eclf = EnsembleVoteClassifier(clfs=[clf1, clf2, clf3], weights=[2, 1, 1], voting='soft') # Loading some example data X, y = iris_data() X = X[:,[0, 2]] # Plotting Decision Regions gs = gridspec.GridSpec(2, 2) fig = plt.figure(figsize=(10, 8)) labels = ['Logistic Regression', 'Random Forest', 'RBF kernel SVM', 'Ensemble'] for clf, lab, grd in zip([clf1, clf2, clf3, eclf], labels, itertools.product([0, 1], repeat=2)): clf.fit(X, y) ax = plt.subplot(gs[grd[0], grd[1]]) fig = plot_decision_regions(X=X, y=y, clf=clf, legend=2) plt.title(lab) plt.show() If you use mlxtend as part of your workflow in a scientific publication, please consider citing the mlxtend repository with the following DOI: @article{raschkas_2018_mlxtend, author = {Sebastian Raschka}, title = {MLxtend: Providing machine learning and data science utilities and extensions to Python\u2019s scientific computing stack}, journal = {The Journal of Open Source Software}, volume = {3}, number = {24}, month = apr, year = 2018, publisher = {The Open Journal}, doi = {10.21105/joss.00638}, url = {http://joss.theoj.org/papers/10.21105/joss.00638} }","title":"Examples"},{"location":"#license","text":"This project is released under a permissive new BSD open source license ( LICENSE-BSD3.txt ) and commercially usable. There is no warranty; not even for merchantability or fitness for a particular purpose. In addition, you may use, copy, modify and redistribute all artistic creative works (figures and images) included in this distribution under the directory according to the terms and conditions of the Creative Commons Attribution 4.0 International License. See the file LICENSE-CC-BY.txt for details. (Computer-generated graphics such as the plots produced by matplotlib fall under the BSD license mentioned above).","title":"License"},{"location":"#contact","text":"I received a lot of feedback and questions about mlxtend recently, and I thought that it would be worthwhile to set up a public communication channel. Before you write an email with a question about mlxtend, please consider posting it here since it can also be useful to others! Please join the Google Groups Mailing List ! If Google Groups is not for you, please feel free to write me an email or consider filing an issue on GitHub's issue tracker for new feature requests or bug reports. In addition, I setup a Gitter channel for live discussions.","title":"Contact"},{"location":"CHANGELOG/","text":"Release Notes The CHANGELOG for the current development version is available at https://github.com/rasbt/mlxtend/blob/master/docs/sources/CHANGELOG.md . Version 0.14.0 (11-09-2018) Downloads Source code (zip) Source code (tar.gz) New Features Added a scatterplotmatrix function to the plotting module. ( #437 ) Added sample_weight option to StackingRegressor , StackingClassifier , StackingCVRegressor , StackingCVClassifier , EnsembleVoteClassifier . ( #438 ) Added a RandomHoldoutSplit class to perform a random train/valid split without rotation in SequentialFeatureSelector , scikit-learn GridSearchCV etc. ( #442 ) Added a PredefinedHoldoutSplit class to perform a train/valid split, based on user-specified indices, without rotation in SequentialFeatureSelector , scikit-learn GridSearchCV etc. ( #443 ) Created a new mlxtend.image submodule for working on image processing-related tasks. ( #457 ) Added a new convenience function extract_face_landmarks based on dlib to mlxtend.image . ( #458 ) Added a method='oob' option to the mlxtend.evaluate.bootstrap_point632_score method to compute the classic out-of-bag bootstrap estimate ( #459 ) Added a method='.632+' option to the mlxtend.evaluate.bootstrap_point632_score method to compute the .632+ bootstrap estimate that addresses the optimism bias of the .632 bootstrap ( #459 ) Added a new mlxtend.evaluate.ftest function to perform an F-test for comparing the accuracies of two or more classification models. ( #460 ) Added a new mlxtend.evaluate.combined_ftest_5x2cv function to perform an combined 5x2cv F-Test for comparing the performance of two models. ( #461 ) Added a new mlxtend.evaluate.difference_proportions test for comparing two proportions (e.g., classifier accuracies) ( #462 ) Changes Addressed deprecations warnings in NumPy 0.15. ( #425 ) Because of complications in PR ( #459 ), Python 2.7 was now dropped; since official support for Python 2.7 by the Python Software Foundation is ending in approx. 12 months anyways, this re-focussing will hopefully free up some developer time with regard to not having to worry about backward compatibility Bug Fixes Fixed an issue with a missing import in mlxtend.plotting.plot_confusion_matrix . ( #428 ) Version 0.13.0 (2018-07-20) Downloads Source code (zip) Source code (tar.gz) New Features A meaningful error message is now raised when a cross-validation generator is used with SequentialFeatureSelector . ( #377 ) The SequentialFeatureSelector now accepts custom feature names via the fit method for more interpretable feature subset reports. ( #379 ) The SequentialFeatureSelector is now also compatible with Pandas DataFrames and uses DataFrame column-names for more interpretable feature subset reports. ( #379 ) ColumnSelector now works with Pandas DataFrames columns. ( #378 by Manuel Garrido ) The ExhaustiveFeatureSelector estimator in mlxtend.feature_selection now is safely stoppable mid-process by control+c. ( #380 ) Two new functions, vectorspace_orthonormalization and vectorspace_dimensionality were added to mlxtend.math to use the Gram-Schmidt process to convert a set of linearly independent vectors into a set of orthonormal basis vectors, and to compute the dimensionality of a vectorspace, respectively. ( #382 ) mlxtend.frequent_patterns.apriori now supports pandas SparseDataFrame s to generate frequent itemsets. ( #404 via Daniel Morales ) The plot_confusion_matrix function now has the ability to show normalized confusion matrix coefficients in addition to or instead of absolute confusion matrix coefficients with or without a colorbar. The text display method has been changed so that the full range of the colormap is used. The default size is also now set based on the number of classes. Added support for merging the meta features with the original input features in StackingRegressor (via use_features_in_secondary ) like it is already supported in the other Stacking classes. ( #418 ) Added a support_only to the association_rules function, which allow constructing association rules (based on the support metric only) for cropped input DataFrames that don't contain a complete set of antecedent and consequent support values. ( #421 ) Changes Itemsets generated with apriori are now frozenset s ( #393 by William Laney and #394 ) Now raises an error if a input DataFrame to apriori contains non 0, 1, True, False values. #419 ) Bug Fixes Allow mlxtend estimators to be cloned via scikit-learn's clone function. ( #374 ) Fixes bug to allow the correct use of refit=False in StackingRegressor and StackingCVRegressor ( #384 and ( #385 ) by selay01 ) Allow StackingClassifier to work with sparse matrices when use_features_in_secondary=True ( #408 by Floris Hoogenbook ) Allow StackingCVRegressor to work with sparse matrices when use_features_in_secondary=True ( #416 ) Allow StackingCVClassifier to work with sparse matrices when use_features_in_secondary=True ( #417 ) Version 0.12.0 (2018-21-04) Downloads Source code (zip) Source code (tar.gz) New Features A new feature_importance_permuation function to compute the feature importance in classifiers and regressors via the permutation importance method ( #358 ) The fit method of the ExhaustiveFeatureSelector now optionally accepts **fit_params for the estimator that is used for the feature selection. ( #354 by Zach Griffith) The fit method of the SequentialFeatureSelector now optionally accepts **fit_params for the estimator that is used for the feature selection. ( #350 by Zach Griffith) Changes Replaced plot_decision_regions colors by a colorblind-friendly palette and adds contour lines for decision regions. ( #348 ) All stacking estimators now raise NonFittedErrors if any method for inference is called prior to fitting the estimator. ( #353 ) Renamed the refit parameter of both the StackingClassifier and StackingCVClassifier to use_clones to be more explicit and less misleading. ( #368 ) Bug Fixes Various changes in the documentation and documentation tools to fix formatting issues ( #363 ) Fixes a bug where the StackingCVClassifier 's meta features were not stored in the original order when shuffle=True ( #370 ) Many documentation improvements, including links to the User Guides in the API docs ( #371 ) Version 0.11.0 (2018-03-14) Downloads Source code (zip) Source code (tar.gz) New Features New function implementing the resampled paired t-test procedure ( paired_ttest_resampled ) to compare the performance of two models. ( #323 ) New function implementing the k-fold paired t-test procedure ( paired_ttest_kfold_cv ) to compare the performance of two models (also called k-hold-out paired t-test). ( #324 ) New function implementing the 5x2cv paired t-test procedure ( paired_ttest_5x2cv ) proposed by Dieterrich (1998) to compare the performance of two models. ( #325 ) A refit parameter was added to stacking classes (similar to the refit parameter in the EnsembleVoteClassifier ), to support classifiers and regressors that follow the scikit-learn API but are not compatible with scikit-learn's clone function. ( #322 ) The ColumnSelector now has a drop_axis argument to use it in pipelines with CountVectorizers . ( #333 ) Changes Raises an informative error message if predict or predict_meta_features is called prior to calling the fit method in StackingRegressor and StackingCVRegressor . ( #315 ) The plot_decision_regions function now automatically determines the optimal setting based on the feature dimensions and supports anti-aliasing. The old res parameter has been deprecated. ( #309 by Guillaume Poirier-Morency ) Apriori code is faster due to optimization in onehot transformation and the amount of candidates generated by the apriori algorithm. ( #327 by Jakub Smid ) The OnehotTransactions class (which is typically often used in combination with the apriori function for association rule mining) is now more memory efficient as it uses boolean arrays instead of integer arrays. In addition, the OnehotTransactions class can be now be provided with sparse argument to generate sparse representations of the onehot matrix to further improve memory efficiency. ( #328 by Jakub Smid ) The OneHotTransactions has been deprecated and replaced by the TransactionEncoder . ( #332 The plot_decision_regions function now has three new parameters, scatter_kwargs , contourf_kwargs , and scatter_highlight_kwargs , that can be used to modify the plotting style. ( #342 by James Bourbeau ) Bug Fixes Fixed issue when class labels were provided to the EnsembleVoteClassifier when refit was set to false . ( #322 ) Allow arrays with 16-bit and 32-bit precision in plot_decision_regions function. ( #337 ) Fixed bug that raised an indexing error if the number of items was <= 1 when computing association rules using the conviction metric. ( #340 ) Version 0.10.0 (2017-12-22) Downloads Source code (zip) Source code (tar.gz) New Features New store_train_meta_features parameter for fit in StackingCVRegressor. if True, train meta-features are stored in self.train_meta_features_ . New pred_meta_features method for StackingCVRegressor . People can get test meta-features using this method. ( #294 via takashioya ) The new store_train_meta_features attribute and pred_meta_features method for the StackingCVRegressor were also added to the StackingRegressor , StackingClassifier , and StackingCVClassifier ( #299 & #300 ) New function ( evaluate.mcnemar_tables ) for creating multiple 2x2 contigency from model predictions arrays that can be used in multiple McNemar (post-hoc) tests or Cochran's Q or F tests, etc. ( #307 ) New function ( evaluate.cochrans_q ) for performing Cochran's Q test to compare the accuracy of multiple classifiers. ( #310 ) Changes Added requirements.txt to setup.py . ( #304 via Colin Carrol ) Bug Fixes Improved numerical stability for p-values computed via the the exact McNemar test ( #306 ) nose is not required to use the library ( #302 ) Version 0.9.1 (2017-11-19) Downloads Source code (zip) Source code (tar.gz) New Features Added mlxtend.evaluate.bootstrap_point632_score to evaluate the performance of estimators using the .632 bootstrap. ( #283 ) New max_len parameter for the frequent itemset generation via the apriori function to allow for early stopping. ( #270 ) Changes All feature index tuples in SequentialFeatureSelector or now in sorted order. ( #262 ) The SequentialFeatureSelector now runs the continuation of the floating inclusion/exclusion as described in Novovicova & Kittler (1994). Note that this didn't cause any difference in performance on any of the test scenarios but could lead to better performance in certain edge cases. ( #262 ) utils.Counter now accepts a name variable to help distinguish between multiple counters, time precision can be set with the 'precision' kwarg and the new attribute end_time holds the time the last iteration completed. ( #278 via Mathew Savage ) Bug Fixes Fixed an deprecation error that occured with McNemar test when using SciPy 1.0. ( #283 ) Version 0.9.0 (2017-10-21) Downloads Source code (zip) Source code (tar.gz) New Features Added evaluate.permutation_test , a permutation test for hypothesis testing (or A/B testing) to test if two samples come from the same distribution. Or in other words, a procedure to test the null hypothesis that that two groups are not significantly different (e.g., a treatment and a control group). ( #250 ) Added 'leverage' and 'conviction as evaluation metrics to the frequent_patterns.association_rules function. ( #246 & #247 ) Added a loadings_ attribute to PrincipalComponentAnalysis to compute the factor loadings of the features on the principal components. ( #251 ) Allow grid search over classifiers/regressors in ensemble and stacking estimators. ( #259 ) New make_multiplexer_dataset function that creates a dataset generated by a n-bit Boolean multiplexer for evaluating supervised learning algorithms. ( #263 ) Added a new BootstrapOutOfBag class, an implementation of the out-of-bag bootstrap to evaluate supervised learning algorithms. ( #265 ) The parameters for StackingClassifier , StackingCVClassifier , StackingRegressor , StackingCVRegressor , and EnsembleVoteClassifier can now be tuned using scikit-learn's GridSearchCV ( #254 via James Bourbeau ) Changes The 'support' column returned by frequent_patterns.association_rules was changed to compute the support of \"antecedant union consequent\", and new antecedant support' and 'consequent support' column were added to avoid ambiguity. ( #245 ) Allow the OnehotTransactions to be cloned via scikit-learn's clone function, which is required by e.g., scikit-learn's FeatureUnion or GridSearchCV (via Iaroslav Shcherbatyi ). ( #249 ) Bug Fixes Fix issues with self._init_time parameter in _IterativeModel subclasses. ( #256 ) Fix imprecision bug that occurred in plot_ecdf when run on Python 2.7. ( 264 ) The vectors from SVD in PrincipalComponentAnalysis are now being scaled so that the eigenvalues via solver='eigen' and solver='svd' now store eigenvalues that have the same magnitudes. ( #251 ) Version 0.8.0 (2017-09-09) Downloads Source code (zip) Source code (tar.gz) New Features Added a mlxtend.evaluate.bootstrap that implements the ordinary nonparametric bootstrap to bootstrap a single statistic (for example, the mean. median, R^2 of a regression fit, and so forth) #232 SequentialFeatureSelecor 's k_features now accepts a string argument \"best\" or \"parsimonious\" for more \"automated\" feature selection. For instance, if \"best\" is provided, the feature selector will return the feature subset with the best cross-validation performance. If \"parsimonious\" is provided as an argument, the smallest feature subset that is within one standard error of the cross-validation performance will be selected. #238 Changes SequentialFeatureSelector now uses np.nanmean over normal mean to support scorers that may return np.nan #211 (via mrkaiser ) The skip_if_stuck parameter was removed from SequentialFeatureSelector in favor of a more efficient implementation comparing the conditional inclusion/exclusion results (in the floating versions) to the performances of previously sampled feature sets that were cached #237 ExhaustiveFeatureSelector was modified to consume substantially less memory #195 (via Adam Erickson ) Bug Fixes Fixed a bug where the SequentialFeatureSelector selected a feature subset larger than then specified via the k_features tuple max-value #213 Version 0.7.0 (2017-06-22) Downloads Source code (zip) Source code (tar.gz) New Features New mlxtend.plotting.ecdf function for plotting empirical cumulative distribution functions ( #196 ). New StackingCVRegressor for stacking regressors with out-of-fold predictions to prevent overfitting ( #201 via Eike Dehling ). Changes The TensorFlow estimator have been removed from mlxtend, since TensorFlow has now very convenient ways to build on estimators, which render those implementations obsolete. plot_decision_regions now supports plotting decision regions for more than 2 training features #189 , via James Bourbeau ). Parallel execution in mlxtend.feature_selection.SequentialFeatureSelector and mlxtend.feature_selection.ExhaustiveFeatureSelector is now performed over different feature subsets instead of the different cross-validation folds to better utilize machines with multiple processors if the number of features is large ( #193 , via @whalebot-helmsman ). Raise meaningful error messages if pandas DataFrame s or Python lists of lists are fed into the StackingCVClassifer as a fit arguments ( 198 ). The n_folds parameter of the StackingCVClassifier was changed to cv and can now accept any kind of cross validation technique that is available from scikit-learn. For example, StackingCVClassifier(..., cv=StratifiedKFold(n_splits=3)) or StackingCVClassifier(..., cv=GroupKFold(n_splits=3)) ( #203 , via Konstantinos Paliouras ). Bug Fixes SequentialFeatureSelector now correctly accepts a None argument for the scoring parameter to infer the default scoring metric from scikit-learn classifiers and regressors ( #171 ). The plot_decision_regions function now supports pre-existing axes objects generated via matplotlib's plt.subplots . ( #184 , see example ) Made math.num_combinations and math.num_permutations numerically stable for large numbers of combinations and permutations ( #200 ). Version 0.6.0 (2017-03-18) Downloads Source code (zip) Source code (tar.gz) New Features An association_rules function is implemented that allows to generate rules based on a list of frequent itemsets (via Joshua Goerner ). Changes Adds a black edgecolor to plots via plotting.plot_decision_regions to make markers more distinguishable from the background in matplotlib>=2.0 . The association submodule was renamed to frequent_patterns . Bug Fixes The DataFrame index of apriori results are now unique and ordered. Fixed typos in autompg and wine datasets (via James Bourbeau ). Version 0.5.1 (2017-02-14) Downloads Source code (zip) Source code (tar.gz) New Features The EnsembleVoteClassifier has a new refit attribute that prevents refitting classifiers if refit=False to save computational time. Added a new lift_score function in evaluate to compute lift score (via Batuhan Bardak ). StackingClassifier and StackingRegressor support multivariate targets if the underlying models do (via kernc ). StackingClassifier has a new use_features_in_secondary attribute like StackingCVClassifier . Changes Changed default verbosity level in SequentialFeatureSelector to 0 The EnsembleVoteClassifier now raises a NotFittedError if the estimator wasn't fit before calling predict . (via Anton Loss ) Added new TensorFlow variable initialization syntax to guarantee compatibility with TensorFlow 1.0 Bug Fixes Fixed wrong default value for k_features in SequentialFeatureSelector Cast selected feature subsets in the SequentialFeautureSelector as sets to prevent the iterator from getting stuck if the k_idx are different permutations of the same combination (via Zac Wellmer ). Fixed an issue with learning curves that caused the performance metrics to be reversed (via ipashchenko ) Fixed a bug that could occur in the SequentialFeatureSelector if there are similarly-well performing subsets in the floating variants (via Zac Wellmer ). Version 0.5.0 (2016-11-09) Downloads Source code (zip) Source code (tar.gz) New Features New ExhaustiveFeatureSelector estimator in mlxtend.feature_selection for evaluating all feature combinations in a specified range The StackingClassifier has a new parameter average_probas that is set to True by default to maintain the current behavior. A deprecation warning was added though, and it will default to False in future releases (0.6.0); average_probas=False will result in stacking of the level-1 predicted probabilities rather than averaging these. New StackingCVClassifier estimator in 'mlxtend.classifier' for implementing a stacking ensemble that uses cross-validation techniques for training the meta-estimator to avoid overfitting ( Reiichiro Nakano ) New OnehotTransactions encoder class added to the preprocessing submodule for transforming transaction data into a one-hot encoded array The SequentialFeatureSelector estimator in mlxtend.feature_selection now is safely stoppable mid-process by control+c, and deprecated print_progress in favor of a more tunable verbose parameter ( Will McGinnis ) New apriori function in association to extract frequent itemsets from transaction data for association rule mining New checkerboard_plot function in plotting to plot checkerboard tables / heat maps New mcnemar_table and mcnemar functions in evaluate to compute 2x2 contingency tables and McNemar's test Changes All plotting functions have been moved to mlxtend.plotting for compatibility reasons with continuous integration services and to make the installation of matplotlib optional for users of mlxtend 's core functionality Added a compatibility layer for scikit-learn 0.18 using the new model_selection module while maintaining backwards compatibility to scikit-learn 0.17. Bug Fixes mlxtend.plotting.plot_decision_regions now draws decision regions correctly if more than 4 class labels are present Raise AttributeError in plot_decision_regions when the X_higlight argument is a 1D array ( chkoar ) Version 0.4.2 (2016-08-24) Downloads Source code (zip) Source code (tar.gz) PDF documentation New Features Added preprocessing.CopyTransformer , a mock class that returns copies of imput arrays via transform and fit_transform Changes Added AppVeyor to CI to ensure MS Windows compatibility Dataset are now saved as compressed .txt or .csv files rather than being imported as Python objects feature_selection.SequentialFeatureSelector now supports the selection of k_features using a tuple to specify a \"min-max\" k_features range Added \"SVD solver\" option to the PrincipalComponentAnalysis Raise a AttributeError with \"not fitted\" message in SequentialFeatureSelector if transform or get_metric_dict are called prior to fit Use small, positive bias units in TfMultiLayerPerceptron 's hidden layer(s) if the activations are ReLUs in order to avoid dead neurons Added an optional clone_estimator parameter to the SequentialFeatureSelector that defaults to True , avoiding the modification of the original estimator objects More rigorous type and shape checks in the evaluate.plot_decision_regions function DenseTransformer now doesn't raise and error if the input array is not sparse API clean-up using scikit-learn's BaseEstimator as parent class for feature_selection.ColumnSelector Bug Fixes Fixed a problem when a tuple-range was provided as argument to the SequentialFeatureSelector 's k_features parameter and the scoring metric was more negative than -1 (e.g., as in scikit-learn's MSE scoring function) (wahutch](https://github.com/wahutch)) Fixed an AttributeError issue when verbose > 1 in StackingClassifier Fixed a bug in classifier.SoftmaxRegression where the mean values of the offsets were used to update the bias units rather than their sum Fixed rare bug in MLP _layer_mapping functions that caused a swap between the random number generation seed when initializing weights and biases Version 0.4.1 (2016-05-01) Downloads Source code (zip) Source code (tar.gz) PDF documentation New Features New TensorFlow estimator for Linear Regression ( tf_regressor.TfLinearRegression ) New k-means clustering estimator ( cluster.Kmeans ) New TensorFlow k-means clustering estimator ( tf_cluster.Kmeans ) Changes Due to refactoring of the estimator classes, the init_weights parameter of the fit methods was globally renamed to init_params Overall performance improvements of estimators due to code clean-up and refactoring Added several additional checks for correct array types and more meaningful exception messages Added optional dropout to the tf_classifier.TfMultiLayerPerceptron classifier for regularization Added an optional decay parameter to the tf_classifier.TfMultiLayerPerceptron classifier for adaptive learning via an exponential decay of the learning rate eta Replaced old NeuralNetMLP by more streamlined MultiLayerPerceptron ( classifier.MultiLayerPerceptron ); now also with softmax in the output layer and categorical cross-entropy loss. Unified init_params parameter for fit functions to continue training where the algorithm left off (if supported) Version 0.4.0 (2016-04-09) New Features New TfSoftmaxRegression classifier using Tensorflow ( tf_classifier.TfSoftmaxRegression ) New SoftmaxRegression classifier ( classifier.SoftmaxRegression ) New TfMultiLayerPerceptron classifier using Tensorflow ( tf_classifier.TfMultiLayerPerceptron ) New StackingRegressor ( regressor.StackingRegressor ) New StackingClassifier ( classifier.StackingClassifier ) New function for one-hot encoding of class labels ( preprocessing.one_hot ) Added GridSearch support to the SequentialFeatureSelector ( feature_selection/.SequentialFeatureSelector ) evaluate.plot_decision_regions improvements: Function now handles class y-class labels correctly if array is of type float Correct handling of input arguments markers and colors Accept an existing Axes via the ax argument New print_progress parameter for all generalized models and multi-layer neural networks for printing time elapsed, ETA, and the current cost of the current epoch Minibatch learning for classifier.LogisticRegression , classifier.Adaline , and regressor.LinearRegression plus streamlined API New Principal Component Analysis class via mlxtend.feature_extraction.PrincipalComponentAnalysis New RBF Kernel Principal Component Analysis class via mlxtend.feature_extraction.RBFKernelPCA New Linear Discriminant Analysis class via mlxtend.feature_extraction.LinearDiscriminantAnalysis Changes The column parameter in mlxtend.preprocessing.standardize now defaults to None to standardize all columns more conveniently Version 0.3.0 (2016-01-31) Downloads Source code (zip) Source code (tar.gz) New Features Added a progress bar tracker to classifier.NeuralNetMLP Added a function to score predicted vs. target class labels evaluate.scoring Added confusion matrix functions to create ( evaluate.confusion_matrix ) and plot ( evaluate.plot_confusion_matrix ) confusion matrices New style parameter and improved axis scaling in mlxtend.evaluate.plot_learning_curves Added loadlocal_mnist to mlxtend.data for streaming MNIST from a local byte files into numpy arrays New NeuralNetMLP parameters: random_weights , shuffle_init , shuffle_epoch New SFS features such as the generation of pandas DataFrame results tables and plotting functions (with confidence intervals, standard deviation, and standard error bars) Added support for regression estimators in SFS Added Boston housing dataset New shuffle parameter for classifier.NeuralNetMLP Changes The mlxtend.preprocessing.standardize function now optionally returns the parameters, which are estimated from the array, for re-use. A further improvement makes the standardize function smarter in order to avoid zero-division errors Cosmetic improvements to the evaluate.plot_decision_regions function such as hiding plot axes Renaming of classifier.EnsembleClassfier to classifier.EnsembleVoteClassifier Improved random weight initialization in Perceptron , Adaline , LinearRegression , and LogisticRegression Changed learning parameter of mlxtend.classifier.Adaline to solver and added \"normal equation\" as closed-form solution solver Hide y-axis labels in mlxtend.evaluate.plot_decision_regions in 1 dimensional evaluations Sequential Feature Selection algorithms were unified into a single SequentialFeatureSelector class with parameters to enable floating selection and toggle between forward and backward selection. Stratified sampling of MNIST (now 500x random samples from each of the 10 digit categories) Renaming mlxtend.plotting to mlxtend.general_plotting in order to distinguish general plotting function from specialized utility function such as evaluate.plot_decision_regions Version 0.2.9 (2015-07-14) Downloads Source code (zip) Source code (tar.gz) New Features Sequential Feature Selection algorithms: SFS, SFFS, SBS, and SFBS Changes Changed regularization & lambda parameters in LogisticRegression to single parameter l2_lambda Version 0.2.8 (2015-06-27) API changes: mlxtend.sklearn.EnsembleClassifier -> mlxtend.classifier.EnsembleClassifier mlxtend.sklearn.ColumnSelector -> mlxtend.feature_selection.ColumnSelector mlxtend.sklearn.DenseTransformer -> mlxtend.preprocessing.DenseTransformer mlxtend.pandas.standardizing -> mlxtend.preprocessing.standardizing mlxtend.pandas.minmax_scaling -> mlxtend.preprocessing.minmax_scaling mlxtend.matplotlib -> mlxtend.plotting Added momentum learning parameter (alpha coefficient) to mlxtend.classifier.NeuralNetMLP . Added adaptive learning rate (decrease constant) to mlxtend.classifier.NeuralNetMLP . mlxtend.pandas.minmax_scaling became mlxtend.preprocessing.minmax_scaling and also supports NumPy arrays now mlxtend.pandas.standardizing became mlxtend.preprocessing.standardizing and now supports both NumPy arrays and pandas DataFrames; also, now ddof parameters to set the degrees of freedom when calculating the standard deviation Version 0.2.7 (2015-06-20) Added multilayer perceptron (feedforward artificial neural network) classifier as mlxtend.classifier.NeuralNetMLP . Added 5000 labeled trainingsamples from the MNIST handwritten digits dataset to mlxtend.data Version 0.2.6 (2015-05-08) Added ordinary least square regression using different solvers (gradient and stochastic gradient descent, and the closed form solution (normal equation) Added option for random weight initialization to logistic regression classifier and updated l2 regularization Added wine dataset to mlxtend.data Added invert_axes parameter mlxtend.matplotlib.enrichtment_plot to optionally plot the \"Count\" on the x-axis New verbose parameter for mlxtend.sklearn.EnsembleClassifier by Alejandro C. Bahnsen Added mlxtend.pandas.standardizing to standardize columns in a Pandas DataFrame Added parameters linestyles and markers to mlxtend.matplotlib.enrichment_plot mlxtend.regression.lin_regplot automatically adds np.newaxis and works w. python lists Added tokenizers: mlxtend.text.extract_emoticons and mlxtend.text.extract_words_and_emoticons Version 0.2.5 (2015-04-17) Added Sequential Backward Selection (mlxtend.sklearn.SBS) Added X_highlight parameter to mlxtend.evaluate.plot_decision_regions for highlighting test data points. Added mlxtend.regression.lin_regplot to plot the fitted line from linear regression. Added mlxtend.matplotlib.stacked_barplot to conveniently produce stacked barplots using pandas DataFrame s. Added mlxtend.matplotlib.enrichment_plot Version 0.2.4 (2015-03-15) Added scoring to mlxtend.evaluate.learning_curves (by user pfsq) Fixed setup.py bug caused by the missing README.html file matplotlib.category_scatter for pandas DataFrames and Numpy arrays Version 0.2.3 (2015-03-11) Added Logistic regression Gradient descent and stochastic gradient descent perceptron was changed to Adaline (Adaptive Linear Neuron) Perceptron and Adaline for {0, 1} classes Added mlxtend.preprocessing.shuffle_arrays_unison function to shuffle one or more NumPy arrays. Added shuffle and random seed parameter to stochastic gradient descent classifier. Added rstrip parameter to mlxtend.file_io.find_filegroups to allow trimming of base names. Added ignore_substring parameter to mlxtend.file_io.find_filegroups and find_files . Replaced .rstrip in mlxtend.file_io.find_filegroups with more robust regex. Gridsearch support for mlxtend.sklearn.EnsembleClassifier Version 0.2.2 (2015-03-01) Improved robustness of EnsembleClassifier. Extended plot_decision_regions() functionality for plotting 1D decision boundaries. Function matplotlib.plot_decision_regions was reorganized to evaluate.plot_decision_regions . evaluate.plot_learning_curves() function added. Added Rosenblatt, gradient descent, and stochastic gradient descent perceptrons. Version 0.2.1 (2015-01-20) Added mlxtend.pandas.minmax_scaling - a function to rescale pandas DataFrame columns. Slight update to the EnsembleClassifier interface (additional voting parameter) Fixed EnsembleClassifier to return correct class labels if class labels are not integers from 0 to n. Added new matplotlib function to plot decision regions of classifiers. Version 0.2.0 (2015-01-13) Improved mlxtend.text.generalize_duplcheck to remove duplicates and prevent endless looping issue. Added recursive search parameter to mlxtend.file_io.find_files. Added check_ext parameter mlxtend.file_io.find_files to search based on file extensions. Default parameter to ignore invisible files for mlxtend.file_io.find. Added transform and fit_transform to the EnsembleClassifier . Added mlxtend.file_io.find_filegroups function. Version 0.1.9 (2015-01-10) Implemented scikit-learn EnsembleClassifier (majority voting rule) class. Version 0.1.8 (2015-01-07) Improvements to mlxtend.text.generalize_names to handle certain Dutch last name prefixes (van, van der, de, etc.). Added mlxtend.text.generalize_name_duplcheck function to apply mlxtend.text.generalize_names function to a pandas DataFrame without creating duplicates. Version 0.1.7 (2015-01-07) Added text utilities with name generalization function. Added and file_io utilities. Version 0.1.6 (2015-01-04) Added combinations and permutations estimators. Version 0.1.5 (2014-12-11) Added DenseTransformer for pipelines and grid search. Version 0.1.4 (2014-08-20) mean_centering function is now a Class that creates MeanCenterer objects that can be used to fit data via the fit method, and center data at the column means via the transform and fit_transform method. Version 0.1.3 (2014-08-19) Added preprocessing module and mean_centering function. Version 0.1.2 (2014-08-19) Added matplotlib utilities and remove_borders function. Version 0.1.1 (2014-08-13) Simplified code for ColumnSelector.","title":"Release Notes"},{"location":"CHANGELOG/#release-notes","text":"The CHANGELOG for the current development version is available at https://github.com/rasbt/mlxtend/blob/master/docs/sources/CHANGELOG.md .","title":"Release Notes"},{"location":"CHANGELOG/#version-0140-11-09-2018","text":"","title":"Version 0.14.0 (11-09-2018)"},{"location":"CHANGELOG/#downloads","text":"Source code (zip) Source code (tar.gz)","title":"Downloads"},{"location":"CHANGELOG/#new-features","text":"Added a scatterplotmatrix function to the plotting module. ( #437 ) Added sample_weight option to StackingRegressor , StackingClassifier , StackingCVRegressor , StackingCVClassifier , EnsembleVoteClassifier . ( #438 ) Added a RandomHoldoutSplit class to perform a random train/valid split without rotation in SequentialFeatureSelector , scikit-learn GridSearchCV etc. ( #442 ) Added a PredefinedHoldoutSplit class to perform a train/valid split, based on user-specified indices, without rotation in SequentialFeatureSelector , scikit-learn GridSearchCV etc. ( #443 ) Created a new mlxtend.image submodule for working on image processing-related tasks. ( #457 ) Added a new convenience function extract_face_landmarks based on dlib to mlxtend.image . ( #458 ) Added a method='oob' option to the mlxtend.evaluate.bootstrap_point632_score method to compute the classic out-of-bag bootstrap estimate ( #459 ) Added a method='.632+' option to the mlxtend.evaluate.bootstrap_point632_score method to compute the .632+ bootstrap estimate that addresses the optimism bias of the .632 bootstrap ( #459 ) Added a new mlxtend.evaluate.ftest function to perform an F-test for comparing the accuracies of two or more classification models. ( #460 ) Added a new mlxtend.evaluate.combined_ftest_5x2cv function to perform an combined 5x2cv F-Test for comparing the performance of two models. ( #461 ) Added a new mlxtend.evaluate.difference_proportions test for comparing two proportions (e.g., classifier accuracies) ( #462 )","title":"New Features"},{"location":"CHANGELOG/#changes","text":"Addressed deprecations warnings in NumPy 0.15. ( #425 ) Because of complications in PR ( #459 ), Python 2.7 was now dropped; since official support for Python 2.7 by the Python Software Foundation is ending in approx. 12 months anyways, this re-focussing will hopefully free up some developer time with regard to not having to worry about backward compatibility","title":"Changes"},{"location":"CHANGELOG/#bug-fixes","text":"Fixed an issue with a missing import in mlxtend.plotting.plot_confusion_matrix . ( #428 )","title":"Bug Fixes"},{"location":"CHANGELOG/#version-0130-2018-07-20","text":"","title":"Version 0.13.0 (2018-07-20)"},{"location":"CHANGELOG/#downloads_1","text":"Source code (zip) Source code (tar.gz)","title":"Downloads"},{"location":"CHANGELOG/#new-features_1","text":"A meaningful error message is now raised when a cross-validation generator is used with SequentialFeatureSelector . ( #377 ) The SequentialFeatureSelector now accepts custom feature names via the fit method for more interpretable feature subset reports. ( #379 ) The SequentialFeatureSelector is now also compatible with Pandas DataFrames and uses DataFrame column-names for more interpretable feature subset reports. ( #379 ) ColumnSelector now works with Pandas DataFrames columns. ( #378 by Manuel Garrido ) The ExhaustiveFeatureSelector estimator in mlxtend.feature_selection now is safely stoppable mid-process by control+c. ( #380 ) Two new functions, vectorspace_orthonormalization and vectorspace_dimensionality were added to mlxtend.math to use the Gram-Schmidt process to convert a set of linearly independent vectors into a set of orthonormal basis vectors, and to compute the dimensionality of a vectorspace, respectively. ( #382 ) mlxtend.frequent_patterns.apriori now supports pandas SparseDataFrame s to generate frequent itemsets. ( #404 via Daniel Morales ) The plot_confusion_matrix function now has the ability to show normalized confusion matrix coefficients in addition to or instead of absolute confusion matrix coefficients with or without a colorbar. The text display method has been changed so that the full range of the colormap is used. The default size is also now set based on the number of classes. Added support for merging the meta features with the original input features in StackingRegressor (via use_features_in_secondary ) like it is already supported in the other Stacking classes. ( #418 ) Added a support_only to the association_rules function, which allow constructing association rules (based on the support metric only) for cropped input DataFrames that don't contain a complete set of antecedent and consequent support values. ( #421 )","title":"New Features"},{"location":"CHANGELOG/#changes_1","text":"Itemsets generated with apriori are now frozenset s ( #393 by William Laney and #394 ) Now raises an error if a input DataFrame to apriori contains non 0, 1, True, False values. #419 )","title":"Changes"},{"location":"CHANGELOG/#bug-fixes_1","text":"Allow mlxtend estimators to be cloned via scikit-learn's clone function. ( #374 ) Fixes bug to allow the correct use of refit=False in StackingRegressor and StackingCVRegressor ( #384 and ( #385 ) by selay01 ) Allow StackingClassifier to work with sparse matrices when use_features_in_secondary=True ( #408 by Floris Hoogenbook ) Allow StackingCVRegressor to work with sparse matrices when use_features_in_secondary=True ( #416 ) Allow StackingCVClassifier to work with sparse matrices when use_features_in_secondary=True ( #417 )","title":"Bug Fixes"},{"location":"CHANGELOG/#version-0120-2018-21-04","text":"","title":"Version 0.12.0 (2018-21-04)"},{"location":"CHANGELOG/#downloads_2","text":"Source code (zip) Source code (tar.gz)","title":"Downloads"},{"location":"CHANGELOG/#new-features_2","text":"A new feature_importance_permuation function to compute the feature importance in classifiers and regressors via the permutation importance method ( #358 ) The fit method of the ExhaustiveFeatureSelector now optionally accepts **fit_params for the estimator that is used for the feature selection. ( #354 by Zach Griffith) The fit method of the SequentialFeatureSelector now optionally accepts **fit_params for the estimator that is used for the feature selection. ( #350 by Zach Griffith)","title":"New Features"},{"location":"CHANGELOG/#changes_2","text":"Replaced plot_decision_regions colors by a colorblind-friendly palette and adds contour lines for decision regions. ( #348 ) All stacking estimators now raise NonFittedErrors if any method for inference is called prior to fitting the estimator. ( #353 ) Renamed the refit parameter of both the StackingClassifier and StackingCVClassifier to use_clones to be more explicit and less misleading. ( #368 )","title":"Changes"},{"location":"CHANGELOG/#bug-fixes_2","text":"Various changes in the documentation and documentation tools to fix formatting issues ( #363 ) Fixes a bug where the StackingCVClassifier 's meta features were not stored in the original order when shuffle=True ( #370 ) Many documentation improvements, including links to the User Guides in the API docs ( #371 )","title":"Bug Fixes"},{"location":"CHANGELOG/#version-0110-2018-03-14","text":"","title":"Version 0.11.0 (2018-03-14)"},{"location":"CHANGELOG/#downloads_3","text":"Source code (zip) Source code (tar.gz)","title":"Downloads"},{"location":"CHANGELOG/#new-features_3","text":"New function implementing the resampled paired t-test procedure ( paired_ttest_resampled ) to compare the performance of two models. ( #323 ) New function implementing the k-fold paired t-test procedure ( paired_ttest_kfold_cv ) to compare the performance of two models (also called k-hold-out paired t-test). ( #324 ) New function implementing the 5x2cv paired t-test procedure ( paired_ttest_5x2cv ) proposed by Dieterrich (1998) to compare the performance of two models. ( #325 ) A refit parameter was added to stacking classes (similar to the refit parameter in the EnsembleVoteClassifier ), to support classifiers and regressors that follow the scikit-learn API but are not compatible with scikit-learn's clone function. ( #322 ) The ColumnSelector now has a drop_axis argument to use it in pipelines with CountVectorizers . ( #333 )","title":"New Features"},{"location":"CHANGELOG/#changes_3","text":"Raises an informative error message if predict or predict_meta_features is called prior to calling the fit method in StackingRegressor and StackingCVRegressor . ( #315 ) The plot_decision_regions function now automatically determines the optimal setting based on the feature dimensions and supports anti-aliasing. The old res parameter has been deprecated. ( #309 by Guillaume Poirier-Morency ) Apriori code is faster due to optimization in onehot transformation and the amount of candidates generated by the apriori algorithm. ( #327 by Jakub Smid ) The OnehotTransactions class (which is typically often used in combination with the apriori function for association rule mining) is now more memory efficient as it uses boolean arrays instead of integer arrays. In addition, the OnehotTransactions class can be now be provided with sparse argument to generate sparse representations of the onehot matrix to further improve memory efficiency. ( #328 by Jakub Smid ) The OneHotTransactions has been deprecated and replaced by the TransactionEncoder . ( #332 The plot_decision_regions function now has three new parameters, scatter_kwargs , contourf_kwargs , and scatter_highlight_kwargs , that can be used to modify the plotting style. ( #342 by James Bourbeau )","title":"Changes"},{"location":"CHANGELOG/#bug-fixes_3","text":"Fixed issue when class labels were provided to the EnsembleVoteClassifier when refit was set to false . ( #322 ) Allow arrays with 16-bit and 32-bit precision in plot_decision_regions function. ( #337 ) Fixed bug that raised an indexing error if the number of items was <= 1 when computing association rules using the conviction metric. ( #340 )","title":"Bug Fixes"},{"location":"CHANGELOG/#version-0100-2017-12-22","text":"","title":"Version 0.10.0 (2017-12-22)"},{"location":"CHANGELOG/#downloads_4","text":"Source code (zip) Source code (tar.gz)","title":"Downloads"},{"location":"CHANGELOG/#new-features_4","text":"New store_train_meta_features parameter for fit in StackingCVRegressor. if True, train meta-features are stored in self.train_meta_features_ . New pred_meta_features method for StackingCVRegressor . People can get test meta-features using this method. ( #294 via takashioya ) The new store_train_meta_features attribute and pred_meta_features method for the StackingCVRegressor were also added to the StackingRegressor , StackingClassifier , and StackingCVClassifier ( #299 & #300 ) New function ( evaluate.mcnemar_tables ) for creating multiple 2x2 contigency from model predictions arrays that can be used in multiple McNemar (post-hoc) tests or Cochran's Q or F tests, etc. ( #307 ) New function ( evaluate.cochrans_q ) for performing Cochran's Q test to compare the accuracy of multiple classifiers. ( #310 )","title":"New Features"},{"location":"CHANGELOG/#changes_4","text":"Added requirements.txt to setup.py . ( #304 via Colin Carrol )","title":"Changes"},{"location":"CHANGELOG/#bug-fixes_4","text":"Improved numerical stability for p-values computed via the the exact McNemar test ( #306 ) nose is not required to use the library ( #302 )","title":"Bug Fixes"},{"location":"CHANGELOG/#version-091-2017-11-19","text":"","title":"Version 0.9.1 (2017-11-19)"},{"location":"CHANGELOG/#downloads_5","text":"Source code (zip) Source code (tar.gz)","title":"Downloads"},{"location":"CHANGELOG/#new-features_5","text":"Added mlxtend.evaluate.bootstrap_point632_score to evaluate the performance of estimators using the .632 bootstrap. ( #283 ) New max_len parameter for the frequent itemset generation via the apriori function to allow for early stopping. ( #270 )","title":"New Features"},{"location":"CHANGELOG/#changes_5","text":"All feature index tuples in SequentialFeatureSelector or now in sorted order. ( #262 ) The SequentialFeatureSelector now runs the continuation of the floating inclusion/exclusion as described in Novovicova & Kittler (1994). Note that this didn't cause any difference in performance on any of the test scenarios but could lead to better performance in certain edge cases. ( #262 ) utils.Counter now accepts a name variable to help distinguish between multiple counters, time precision can be set with the 'precision' kwarg and the new attribute end_time holds the time the last iteration completed. ( #278 via Mathew Savage )","title":"Changes"},{"location":"CHANGELOG/#bug-fixes_5","text":"Fixed an deprecation error that occured with McNemar test when using SciPy 1.0. ( #283 )","title":"Bug Fixes"},{"location":"CHANGELOG/#version-090-2017-10-21","text":"","title":"Version 0.9.0 (2017-10-21)"},{"location":"CHANGELOG/#downloads_6","text":"Source code (zip) Source code (tar.gz)","title":"Downloads"},{"location":"CHANGELOG/#new-features_6","text":"Added evaluate.permutation_test , a permutation test for hypothesis testing (or A/B testing) to test if two samples come from the same distribution. Or in other words, a procedure to test the null hypothesis that that two groups are not significantly different (e.g., a treatment and a control group). ( #250 ) Added 'leverage' and 'conviction as evaluation metrics to the frequent_patterns.association_rules function. ( #246 & #247 ) Added a loadings_ attribute to PrincipalComponentAnalysis to compute the factor loadings of the features on the principal components. ( #251 ) Allow grid search over classifiers/regressors in ensemble and stacking estimators. ( #259 ) New make_multiplexer_dataset function that creates a dataset generated by a n-bit Boolean multiplexer for evaluating supervised learning algorithms. ( #263 ) Added a new BootstrapOutOfBag class, an implementation of the out-of-bag bootstrap to evaluate supervised learning algorithms. ( #265 ) The parameters for StackingClassifier , StackingCVClassifier , StackingRegressor , StackingCVRegressor , and EnsembleVoteClassifier can now be tuned using scikit-learn's GridSearchCV ( #254 via James Bourbeau )","title":"New Features"},{"location":"CHANGELOG/#changes_6","text":"The 'support' column returned by frequent_patterns.association_rules was changed to compute the support of \"antecedant union consequent\", and new antecedant support' and 'consequent support' column were added to avoid ambiguity. ( #245 ) Allow the OnehotTransactions to be cloned via scikit-learn's clone function, which is required by e.g., scikit-learn's FeatureUnion or GridSearchCV (via Iaroslav Shcherbatyi ). ( #249 )","title":"Changes"},{"location":"CHANGELOG/#bug-fixes_6","text":"Fix issues with self._init_time parameter in _IterativeModel subclasses. ( #256 ) Fix imprecision bug that occurred in plot_ecdf when run on Python 2.7. ( 264 ) The vectors from SVD in PrincipalComponentAnalysis are now being scaled so that the eigenvalues via solver='eigen' and solver='svd' now store eigenvalues that have the same magnitudes. ( #251 )","title":"Bug Fixes"},{"location":"CHANGELOG/#version-080-2017-09-09","text":"","title":"Version 0.8.0 (2017-09-09)"},{"location":"CHANGELOG/#downloads_7","text":"Source code (zip) Source code (tar.gz)","title":"Downloads"},{"location":"CHANGELOG/#new-features_7","text":"Added a mlxtend.evaluate.bootstrap that implements the ordinary nonparametric bootstrap to bootstrap a single statistic (for example, the mean. median, R^2 of a regression fit, and so forth) #232 SequentialFeatureSelecor 's k_features now accepts a string argument \"best\" or \"parsimonious\" for more \"automated\" feature selection. For instance, if \"best\" is provided, the feature selector will return the feature subset with the best cross-validation performance. If \"parsimonious\" is provided as an argument, the smallest feature subset that is within one standard error of the cross-validation performance will be selected. #238","title":"New Features"},{"location":"CHANGELOG/#changes_7","text":"SequentialFeatureSelector now uses np.nanmean over normal mean to support scorers that may return np.nan #211 (via mrkaiser ) The skip_if_stuck parameter was removed from SequentialFeatureSelector in favor of a more efficient implementation comparing the conditional inclusion/exclusion results (in the floating versions) to the performances of previously sampled feature sets that were cached #237 ExhaustiveFeatureSelector was modified to consume substantially less memory #195 (via Adam Erickson )","title":"Changes"},{"location":"CHANGELOG/#bug-fixes_7","text":"Fixed a bug where the SequentialFeatureSelector selected a feature subset larger than then specified via the k_features tuple max-value #213","title":"Bug Fixes"},{"location":"CHANGELOG/#version-070-2017-06-22","text":"","title":"Version 0.7.0 (2017-06-22)"},{"location":"CHANGELOG/#downloads_8","text":"Source code (zip) Source code (tar.gz)","title":"Downloads"},{"location":"CHANGELOG/#new-features_8","text":"New mlxtend.plotting.ecdf function for plotting empirical cumulative distribution functions ( #196 ). New StackingCVRegressor for stacking regressors with out-of-fold predictions to prevent overfitting ( #201 via Eike Dehling ).","title":"New Features"},{"location":"CHANGELOG/#changes_8","text":"The TensorFlow estimator have been removed from mlxtend, since TensorFlow has now very convenient ways to build on estimators, which render those implementations obsolete. plot_decision_regions now supports plotting decision regions for more than 2 training features #189 , via James Bourbeau ). Parallel execution in mlxtend.feature_selection.SequentialFeatureSelector and mlxtend.feature_selection.ExhaustiveFeatureSelector is now performed over different feature subsets instead of the different cross-validation folds to better utilize machines with multiple processors if the number of features is large ( #193 , via @whalebot-helmsman ). Raise meaningful error messages if pandas DataFrame s or Python lists of lists are fed into the StackingCVClassifer as a fit arguments ( 198 ). The n_folds parameter of the StackingCVClassifier was changed to cv and can now accept any kind of cross validation technique that is available from scikit-learn. For example, StackingCVClassifier(..., cv=StratifiedKFold(n_splits=3)) or StackingCVClassifier(..., cv=GroupKFold(n_splits=3)) ( #203 , via Konstantinos Paliouras ).","title":"Changes"},{"location":"CHANGELOG/#bug-fixes_8","text":"SequentialFeatureSelector now correctly accepts a None argument for the scoring parameter to infer the default scoring metric from scikit-learn classifiers and regressors ( #171 ). The plot_decision_regions function now supports pre-existing axes objects generated via matplotlib's plt.subplots . ( #184 , see example ) Made math.num_combinations and math.num_permutations numerically stable for large numbers of combinations and permutations ( #200 ).","title":"Bug Fixes"},{"location":"CHANGELOG/#version-060-2017-03-18","text":"","title":"Version 0.6.0 (2017-03-18)"},{"location":"CHANGELOG/#downloads_9","text":"Source code (zip) Source code (tar.gz)","title":"Downloads"},{"location":"CHANGELOG/#new-features_9","text":"An association_rules function is implemented that allows to generate rules based on a list of frequent itemsets (via Joshua Goerner ).","title":"New Features"},{"location":"CHANGELOG/#changes_9","text":"Adds a black edgecolor to plots via plotting.plot_decision_regions to make markers more distinguishable from the background in matplotlib>=2.0 . The association submodule was renamed to frequent_patterns .","title":"Changes"},{"location":"CHANGELOG/#bug-fixes_9","text":"The DataFrame index of apriori results are now unique and ordered. Fixed typos in autompg and wine datasets (via James Bourbeau ).","title":"Bug Fixes"},{"location":"CHANGELOG/#version-051-2017-02-14","text":"","title":"Version 0.5.1 (2017-02-14)"},{"location":"CHANGELOG/#downloads_10","text":"Source code (zip) Source code (tar.gz)","title":"Downloads"},{"location":"CHANGELOG/#new-features_10","text":"The EnsembleVoteClassifier has a new refit attribute that prevents refitting classifiers if refit=False to save computational time. Added a new lift_score function in evaluate to compute lift score (via Batuhan Bardak ). StackingClassifier and StackingRegressor support multivariate targets if the underlying models do (via kernc ). StackingClassifier has a new use_features_in_secondary attribute like StackingCVClassifier .","title":"New Features"},{"location":"CHANGELOG/#changes_10","text":"Changed default verbosity level in SequentialFeatureSelector to 0 The EnsembleVoteClassifier now raises a NotFittedError if the estimator wasn't fit before calling predict . (via Anton Loss ) Added new TensorFlow variable initialization syntax to guarantee compatibility with TensorFlow 1.0","title":"Changes"},{"location":"CHANGELOG/#bug-fixes_10","text":"Fixed wrong default value for k_features in SequentialFeatureSelector Cast selected feature subsets in the SequentialFeautureSelector as sets to prevent the iterator from getting stuck if the k_idx are different permutations of the same combination (via Zac Wellmer ). Fixed an issue with learning curves that caused the performance metrics to be reversed (via ipashchenko ) Fixed a bug that could occur in the SequentialFeatureSelector if there are similarly-well performing subsets in the floating variants (via Zac Wellmer ).","title":"Bug Fixes"},{"location":"CHANGELOG/#version-050-2016-11-09","text":"","title":"Version 0.5.0 (2016-11-09)"},{"location":"CHANGELOG/#downloads_11","text":"Source code (zip) Source code (tar.gz)","title":"Downloads"},{"location":"CHANGELOG/#new-features_11","text":"New ExhaustiveFeatureSelector estimator in mlxtend.feature_selection for evaluating all feature combinations in a specified range The StackingClassifier has a new parameter average_probas that is set to True by default to maintain the current behavior. A deprecation warning was added though, and it will default to False in future releases (0.6.0); average_probas=False will result in stacking of the level-1 predicted probabilities rather than averaging these. New StackingCVClassifier estimator in 'mlxtend.classifier' for implementing a stacking ensemble that uses cross-validation techniques for training the meta-estimator to avoid overfitting ( Reiichiro Nakano ) New OnehotTransactions encoder class added to the preprocessing submodule for transforming transaction data into a one-hot encoded array The SequentialFeatureSelector estimator in mlxtend.feature_selection now is safely stoppable mid-process by control+c, and deprecated print_progress in favor of a more tunable verbose parameter ( Will McGinnis ) New apriori function in association to extract frequent itemsets from transaction data for association rule mining New checkerboard_plot function in plotting to plot checkerboard tables / heat maps New mcnemar_table and mcnemar functions in evaluate to compute 2x2 contingency tables and McNemar's test","title":"New Features"},{"location":"CHANGELOG/#changes_11","text":"All plotting functions have been moved to mlxtend.plotting for compatibility reasons with continuous integration services and to make the installation of matplotlib optional for users of mlxtend 's core functionality Added a compatibility layer for scikit-learn 0.18 using the new model_selection module while maintaining backwards compatibility to scikit-learn 0.17.","title":"Changes"},{"location":"CHANGELOG/#bug-fixes_11","text":"mlxtend.plotting.plot_decision_regions now draws decision regions correctly if more than 4 class labels are present Raise AttributeError in plot_decision_regions when the X_higlight argument is a 1D array ( chkoar )","title":"Bug Fixes"},{"location":"CHANGELOG/#version-042-2016-08-24","text":"","title":"Version 0.4.2 (2016-08-24)"},{"location":"CHANGELOG/#downloads_12","text":"Source code (zip) Source code (tar.gz) PDF documentation","title":"Downloads"},{"location":"CHANGELOG/#new-features_12","text":"Added preprocessing.CopyTransformer , a mock class that returns copies of imput arrays via transform and fit_transform","title":"New Features"},{"location":"CHANGELOG/#changes_12","text":"Added AppVeyor to CI to ensure MS Windows compatibility Dataset are now saved as compressed .txt or .csv files rather than being imported as Python objects feature_selection.SequentialFeatureSelector now supports the selection of k_features using a tuple to specify a \"min-max\" k_features range Added \"SVD solver\" option to the PrincipalComponentAnalysis Raise a AttributeError with \"not fitted\" message in SequentialFeatureSelector if transform or get_metric_dict are called prior to fit Use small, positive bias units in TfMultiLayerPerceptron 's hidden layer(s) if the activations are ReLUs in order to avoid dead neurons Added an optional clone_estimator parameter to the SequentialFeatureSelector that defaults to True , avoiding the modification of the original estimator objects More rigorous type and shape checks in the evaluate.plot_decision_regions function DenseTransformer now doesn't raise and error if the input array is not sparse API clean-up using scikit-learn's BaseEstimator as parent class for feature_selection.ColumnSelector","title":"Changes"},{"location":"CHANGELOG/#bug-fixes_12","text":"Fixed a problem when a tuple-range was provided as argument to the SequentialFeatureSelector 's k_features parameter and the scoring metric was more negative than -1 (e.g., as in scikit-learn's MSE scoring function) (wahutch](https://github.com/wahutch)) Fixed an AttributeError issue when verbose > 1 in StackingClassifier Fixed a bug in classifier.SoftmaxRegression where the mean values of the offsets were used to update the bias units rather than their sum Fixed rare bug in MLP _layer_mapping functions that caused a swap between the random number generation seed when initializing weights and biases","title":"Bug Fixes"},{"location":"CHANGELOG/#version-041-2016-05-01","text":"","title":"Version 0.4.1 (2016-05-01)"},{"location":"CHANGELOG/#downloads_13","text":"Source code (zip) Source code (tar.gz) PDF documentation","title":"Downloads"},{"location":"CHANGELOG/#new-features_13","text":"New TensorFlow estimator for Linear Regression ( tf_regressor.TfLinearRegression ) New k-means clustering estimator ( cluster.Kmeans ) New TensorFlow k-means clustering estimator ( tf_cluster.Kmeans )","title":"New Features"},{"location":"CHANGELOG/#changes_13","text":"Due to refactoring of the estimator classes, the init_weights parameter of the fit methods was globally renamed to init_params Overall performance improvements of estimators due to code clean-up and refactoring Added several additional checks for correct array types and more meaningful exception messages Added optional dropout to the tf_classifier.TfMultiLayerPerceptron classifier for regularization Added an optional decay parameter to the tf_classifier.TfMultiLayerPerceptron classifier for adaptive learning via an exponential decay of the learning rate eta Replaced old NeuralNetMLP by more streamlined MultiLayerPerceptron ( classifier.MultiLayerPerceptron ); now also with softmax in the output layer and categorical cross-entropy loss. Unified init_params parameter for fit functions to continue training where the algorithm left off (if supported)","title":"Changes"},{"location":"CHANGELOG/#version-040-2016-04-09","text":"","title":"Version 0.4.0 (2016-04-09)"},{"location":"CHANGELOG/#new-features_14","text":"New TfSoftmaxRegression classifier using Tensorflow ( tf_classifier.TfSoftmaxRegression ) New SoftmaxRegression classifier ( classifier.SoftmaxRegression ) New TfMultiLayerPerceptron classifier using Tensorflow ( tf_classifier.TfMultiLayerPerceptron ) New StackingRegressor ( regressor.StackingRegressor ) New StackingClassifier ( classifier.StackingClassifier ) New function for one-hot encoding of class labels ( preprocessing.one_hot ) Added GridSearch support to the SequentialFeatureSelector ( feature_selection/.SequentialFeatureSelector ) evaluate.plot_decision_regions improvements: Function now handles class y-class labels correctly if array is of type float Correct handling of input arguments markers and colors Accept an existing Axes via the ax argument New print_progress parameter for all generalized models and multi-layer neural networks for printing time elapsed, ETA, and the current cost of the current epoch Minibatch learning for classifier.LogisticRegression , classifier.Adaline , and regressor.LinearRegression plus streamlined API New Principal Component Analysis class via mlxtend.feature_extraction.PrincipalComponentAnalysis New RBF Kernel Principal Component Analysis class via mlxtend.feature_extraction.RBFKernelPCA New Linear Discriminant Analysis class via mlxtend.feature_extraction.LinearDiscriminantAnalysis","title":"New Features"},{"location":"CHANGELOG/#changes_14","text":"The column parameter in mlxtend.preprocessing.standardize now defaults to None to standardize all columns more conveniently","title":"Changes"},{"location":"CHANGELOG/#version-030-2016-01-31","text":"","title":"Version 0.3.0 (2016-01-31)"},{"location":"CHANGELOG/#downloads_14","text":"Source code (zip) Source code (tar.gz)","title":"Downloads"},{"location":"CHANGELOG/#new-features_15","text":"Added a progress bar tracker to classifier.NeuralNetMLP Added a function to score predicted vs. target class labels evaluate.scoring Added confusion matrix functions to create ( evaluate.confusion_matrix ) and plot ( evaluate.plot_confusion_matrix ) confusion matrices New style parameter and improved axis scaling in mlxtend.evaluate.plot_learning_curves Added loadlocal_mnist to mlxtend.data for streaming MNIST from a local byte files into numpy arrays New NeuralNetMLP parameters: random_weights , shuffle_init , shuffle_epoch New SFS features such as the generation of pandas DataFrame results tables and plotting functions (with confidence intervals, standard deviation, and standard error bars) Added support for regression estimators in SFS Added Boston housing dataset New shuffle parameter for classifier.NeuralNetMLP","title":"New Features"},{"location":"CHANGELOG/#changes_15","text":"The mlxtend.preprocessing.standardize function now optionally returns the parameters, which are estimated from the array, for re-use. A further improvement makes the standardize function smarter in order to avoid zero-division errors Cosmetic improvements to the evaluate.plot_decision_regions function such as hiding plot axes Renaming of classifier.EnsembleClassfier to classifier.EnsembleVoteClassifier Improved random weight initialization in Perceptron , Adaline , LinearRegression , and LogisticRegression Changed learning parameter of mlxtend.classifier.Adaline to solver and added \"normal equation\" as closed-form solution solver Hide y-axis labels in mlxtend.evaluate.plot_decision_regions in 1 dimensional evaluations Sequential Feature Selection algorithms were unified into a single SequentialFeatureSelector class with parameters to enable floating selection and toggle between forward and backward selection. Stratified sampling of MNIST (now 500x random samples from each of the 10 digit categories) Renaming mlxtend.plotting to mlxtend.general_plotting in order to distinguish general plotting function from specialized utility function such as evaluate.plot_decision_regions","title":"Changes"},{"location":"CHANGELOG/#version-029-2015-07-14","text":"","title":"Version 0.2.9 (2015-07-14)"},{"location":"CHANGELOG/#downloads_15","text":"Source code (zip) Source code (tar.gz)","title":"Downloads"},{"location":"CHANGELOG/#new-features_16","text":"Sequential Feature Selection algorithms: SFS, SFFS, SBS, and SFBS","title":"New Features"},{"location":"CHANGELOG/#changes_16","text":"Changed regularization & lambda parameters in LogisticRegression to single parameter l2_lambda","title":"Changes"},{"location":"CHANGELOG/#version-028-2015-06-27","text":"API changes: mlxtend.sklearn.EnsembleClassifier -> mlxtend.classifier.EnsembleClassifier mlxtend.sklearn.ColumnSelector -> mlxtend.feature_selection.ColumnSelector mlxtend.sklearn.DenseTransformer -> mlxtend.preprocessing.DenseTransformer mlxtend.pandas.standardizing -> mlxtend.preprocessing.standardizing mlxtend.pandas.minmax_scaling -> mlxtend.preprocessing.minmax_scaling mlxtend.matplotlib -> mlxtend.plotting Added momentum learning parameter (alpha coefficient) to mlxtend.classifier.NeuralNetMLP . Added adaptive learning rate (decrease constant) to mlxtend.classifier.NeuralNetMLP . mlxtend.pandas.minmax_scaling became mlxtend.preprocessing.minmax_scaling and also supports NumPy arrays now mlxtend.pandas.standardizing became mlxtend.preprocessing.standardizing and now supports both NumPy arrays and pandas DataFrames; also, now ddof parameters to set the degrees of freedom when calculating the standard deviation","title":"Version 0.2.8 (2015-06-27)"},{"location":"CHANGELOG/#version-027-2015-06-20","text":"Added multilayer perceptron (feedforward artificial neural network) classifier as mlxtend.classifier.NeuralNetMLP . Added 5000 labeled trainingsamples from the MNIST handwritten digits dataset to mlxtend.data","title":"Version 0.2.7 (2015-06-20)"},{"location":"CHANGELOG/#version-026-2015-05-08","text":"Added ordinary least square regression using different solvers (gradient and stochastic gradient descent, and the closed form solution (normal equation) Added option for random weight initialization to logistic regression classifier and updated l2 regularization Added wine dataset to mlxtend.data Added invert_axes parameter mlxtend.matplotlib.enrichtment_plot to optionally plot the \"Count\" on the x-axis New verbose parameter for mlxtend.sklearn.EnsembleClassifier by Alejandro C. Bahnsen Added mlxtend.pandas.standardizing to standardize columns in a Pandas DataFrame Added parameters linestyles and markers to mlxtend.matplotlib.enrichment_plot mlxtend.regression.lin_regplot automatically adds np.newaxis and works w. python lists Added tokenizers: mlxtend.text.extract_emoticons and mlxtend.text.extract_words_and_emoticons","title":"Version 0.2.6 (2015-05-08)"},{"location":"CHANGELOG/#version-025-2015-04-17","text":"Added Sequential Backward Selection (mlxtend.sklearn.SBS) Added X_highlight parameter to mlxtend.evaluate.plot_decision_regions for highlighting test data points. Added mlxtend.regression.lin_regplot to plot the fitted line from linear regression. Added mlxtend.matplotlib.stacked_barplot to conveniently produce stacked barplots using pandas DataFrame s. Added mlxtend.matplotlib.enrichment_plot","title":"Version 0.2.5 (2015-04-17)"},{"location":"CHANGELOG/#version-024-2015-03-15","text":"Added scoring to mlxtend.evaluate.learning_curves (by user pfsq) Fixed setup.py bug caused by the missing README.html file matplotlib.category_scatter for pandas DataFrames and Numpy arrays","title":"Version 0.2.4 (2015-03-15)"},{"location":"CHANGELOG/#version-023-2015-03-11","text":"Added Logistic regression Gradient descent and stochastic gradient descent perceptron was changed to Adaline (Adaptive Linear Neuron) Perceptron and Adaline for {0, 1} classes Added mlxtend.preprocessing.shuffle_arrays_unison function to shuffle one or more NumPy arrays. Added shuffle and random seed parameter to stochastic gradient descent classifier. Added rstrip parameter to mlxtend.file_io.find_filegroups to allow trimming of base names. Added ignore_substring parameter to mlxtend.file_io.find_filegroups and find_files . Replaced .rstrip in mlxtend.file_io.find_filegroups with more robust regex. Gridsearch support for mlxtend.sklearn.EnsembleClassifier","title":"Version 0.2.3 (2015-03-11)"},{"location":"CHANGELOG/#version-022-2015-03-01","text":"Improved robustness of EnsembleClassifier. Extended plot_decision_regions() functionality for plotting 1D decision boundaries. Function matplotlib.plot_decision_regions was reorganized to evaluate.plot_decision_regions . evaluate.plot_learning_curves() function added. Added Rosenblatt, gradient descent, and stochastic gradient descent perceptrons.","title":"Version 0.2.2 (2015-03-01)"},{"location":"CHANGELOG/#version-021-2015-01-20","text":"Added mlxtend.pandas.minmax_scaling - a function to rescale pandas DataFrame columns. Slight update to the EnsembleClassifier interface (additional voting parameter) Fixed EnsembleClassifier to return correct class labels if class labels are not integers from 0 to n. Added new matplotlib function to plot decision regions of classifiers.","title":"Version 0.2.1 (2015-01-20)"},{"location":"CHANGELOG/#version-020-2015-01-13","text":"Improved mlxtend.text.generalize_duplcheck to remove duplicates and prevent endless looping issue. Added recursive search parameter to mlxtend.file_io.find_files. Added check_ext parameter mlxtend.file_io.find_files to search based on file extensions. Default parameter to ignore invisible files for mlxtend.file_io.find. Added transform and fit_transform to the EnsembleClassifier . Added mlxtend.file_io.find_filegroups function.","title":"Version 0.2.0 (2015-01-13)"},{"location":"CHANGELOG/#version-019-2015-01-10","text":"Implemented scikit-learn EnsembleClassifier (majority voting rule) class.","title":"Version 0.1.9 (2015-01-10)"},{"location":"CHANGELOG/#version-018-2015-01-07","text":"Improvements to mlxtend.text.generalize_names to handle certain Dutch last name prefixes (van, van der, de, etc.). Added mlxtend.text.generalize_name_duplcheck function to apply mlxtend.text.generalize_names function to a pandas DataFrame without creating duplicates.","title":"Version 0.1.8 (2015-01-07)"},{"location":"CHANGELOG/#version-017-2015-01-07","text":"Added text utilities with name generalization function. Added and file_io utilities.","title":"Version 0.1.7 (2015-01-07)"},{"location":"CHANGELOG/#version-016-2015-01-04","text":"Added combinations and permutations estimators.","title":"Version 0.1.6 (2015-01-04)"},{"location":"CHANGELOG/#version-015-2014-12-11","text":"Added DenseTransformer for pipelines and grid search.","title":"Version 0.1.5 (2014-12-11)"},{"location":"CHANGELOG/#version-014-2014-08-20","text":"mean_centering function is now a Class that creates MeanCenterer objects that can be used to fit data via the fit method, and center data at the column means via the transform and fit_transform method.","title":"Version 0.1.4 (2014-08-20)"},{"location":"CHANGELOG/#version-013-2014-08-19","text":"Added preprocessing module and mean_centering function.","title":"Version 0.1.3 (2014-08-19)"},{"location":"CHANGELOG/#version-012-2014-08-19","text":"Added matplotlib utilities and remove_borders function.","title":"Version 0.1.2 (2014-08-19)"},{"location":"CHANGELOG/#version-011-2014-08-13","text":"Simplified code for ColumnSelector.","title":"Version 0.1.1 (2014-08-13)"},{"location":"CONTRIBUTING/","text":"How to Contribute I would be very happy about any kind of contributions that help to improve and extend the functionality of mlxtend. Quick Contributor Checklist This is a quick checklist about the different steps of a typical contribution to mlxtend (and other open source projects). Consider copying this list to a local text file (or the issue tracker) and checking off items as you go. [ ] Open a new \"issue\" on GitHub to discuss the new feature / bug fix [ ] Fork the mlxtend repository from GitHub (if not already done earlier) [ ] Create and check out a new topic branch (please don't make modifications in the master branch) [ ] Implement the new feature or apply the bug-fix [ ] Add appropriate unit test functions in mlxtend/*/tests [ ] Run nosetests ./mlxtend -sv and make sure that all unit tests pass [ ] Check/improve the test coverage by running nosetests ./mlxtend --with-coverage [ ] Check for style issues by running flake8 ./mlxtend (you may want to run nosetests again after you made modifications to the code) [ ] Add a note about the modification/contribution to the ./docs/sources/changelog.md file [ ] Modify documentation in the appropriate location under mlxtend/docs/sources/ [ ] Push the topic branch to the server and create a pull request [ ] Check the Travis-CI build passed at https://travis-ci.org/rasbt/mlxtend [ ] Check/improve the unit test coverage at https://coveralls.io/github/rasbt/mlxtend [ ] Check/improve the code health at https://landscape.io/github/rasbt/mlxtend Tips for Contributors Getting Started - Creating a New Issue and Forking the Repository If you don't have a GitHub account, yet, please create one to contribute to this project. Please submit a ticket for your issue to discuss the fix or new feature before too much time and effort is spent for the implementation. Fork the mlxtend repository from the GitHub web interface. Clone the mlxtend repository to your local machine by executing git clone https://github.com//mlxtend.git Syncing an Existing Fork If you already forked mlxtend earlier, you can bring you \"Fork\" up to date with the master branch as follows: 1. Configuring a remote that points to the upstream repository on GitHub List the current configured remote repository of your fork by executing $ git remote -v If you see something like origin https://github.com//mlxtend.git (fetch) origin https://github.com//mlxtend.git (push) you need to specify a new remote upstream repository via $ git remote add upstream https://github.com/rasbt/mlxtend.git Now, verify the new upstream repository you've specified for your fork by executing $ git remote -v You should see following output if everything is configured correctly: origin https://github.com//mlxtend.git (fetch) origin https://github.com//mlxtend.git (push) upstream https://github.com/rasbt/mlxtend.git (fetch) upstream https://github.com/rasbt/mlxtend.git (push) 2. Syncing your Fork First, fetch the updates of the original project's master branch by executing: $ git fetch upstream You should see the following output remote: Counting objects: xx, done. remote: Compressing objects: 100% (xx/xx), done. remote: Total xx (delta xx), reused xx (delta x) Unpacking objects: 100% (xx/xx), done. From https://github.com/rasbt/mlxtend * [new branch] master -> upstream/master This means that the commits to the rasbt/mlxtend master branch are now stored in the local branch upstream/master . If you are not already on your local project's master branch, execute $ git checkout master Finally, merge the changes in upstream/master to your local master branch by executing $ git merge upstream/master which will give you an output that looks similar to Updating xxx...xxx Fast-forward SOME FILE1 | 12 +++++++ SOME FILE2 | 10 +++++++ 2 files changed, 22 insertions(+), *The Main Workflow - Making Changes in a New Topic Branch Listed below are the 9 typical steps of a contribution. 1. Discussing the Feature or Modification Before you start coding, please discuss the new feature, bugfix, or other modification to the project on the project's issue tracker . Before you open a \"new issue,\" please do a quick search to see if a similar issue has been submitted already. 2. Creating a new feature branch Please avoid working directly on the master branch but create a new feature branch: $ git branch Switch to the new feature branch by executing $ git checkout 3. Developing the new feature / bug fix Now it's time to modify existing code or to contribute new code to the project. 4. Testing your code Add the respective unit tests and check if they pass: $ nosetests -sv Use the --with-coverage flag to ensure that all code is being covered in the unit tests: $ nosetests --with-coverage 5. Documenting changes Please add an entry to the mlxtend/docs/sources/changelog.md file. If it is a new feature, it would also be nice if you could update the documentation in appropriate location in mlxtend/sources . 6. Committing changes When you are ready to commit the changes, please provide a meaningful commit message: $ git add # or `git add .` $ git commit -m '' 7. Optional: squashing commits If you made multiple smaller commits, it would be nice if you could group them into a larger, summarizing commit. First, list your recent commit via Note Due to the improved GitHub UI, this is no longer necessary/encouraged. $ git log which will list the commits from newest to oldest in the following format by default: commit 046e3af8a9127df8eac879454f029937c8a31c41 Author: rasbt Date: Tue Nov 24 03:46:37 2015 -0500 fixed setup.py commit c3c00f6ba0e8f48bbe1c9081b8ae3817e57ecc5c Author: rasbt Date: Tue Nov 24 03:04:39 2015 -0500 documented feature x commit d87934fe8726c46f0b166d6290a3bf38915d6e75 Author: rasbt Date: Tue Nov 24 02:44:45 2015 -0500 added support for feature x Assuming that it would make sense to group these 3 commits into one, we can execute $ git rebase -i HEAD~3 which will bring our default git editor with the following contents: pick d87934f added support for feature x pick c3c00f6 documented feature x pick 046e3af fixed setup.py Since c3c00f6 and 046e3af are related to the original commit of feature x , let's keep the d87934f and squash the 2 following commits into this initial one by changes the lines to pick d87934f added support for feature x squash c3c00f6 documented feature x squash 046e3af fixed setup.py Now, save the changes in your editor. Now, quitting the editor will apply the rebase changes, and the editor will open a second time, prompting you to enter a new commit message. In this case, we could enter support for feature x to summarize the contributions. 8. Uploading changes Push your changes to a topic branch to the git server by executing: $ git push origin 9. Submitting a pull request Go to your GitHub repository online, select the new feature branch, and submit a new pull request: Notes for Developers Building the documentation The documentation is built via MkDocs ; to ensure that the documentation is rendered correctly, you can view the documentation locally by executing mkdocs serve from the mlxtend/docs directory. For example, ~/github/mlxtend/docs$ mkdocs serve 1. Building the API documentation To build the API documentation, navigate to mlxtend/docs and execute the make_api.py file from this directory via ~/github/mlxtend/docs$ python make_api.py This should place the API documentation into the correct directories into the two directories: mlxtend/docs/sources/api_modules mlxtend/docs/sources/api_subpackes 2. Editing the User Guide The documents containing code examples for the \"User Guide\" are generated from IPython Notebook files. In order to convert a IPython notebook file to markdown after editing, please follow the following steps: Modify or edit the existing notebook. Execute all cells in the current notebook and make sure that no errors occur. Convert the notebook to markdown using the ipynb2markdown.py converter ~/github/mlxtend/docs$ python ipynb2markdown.py --ipynb_path ./sources/user_guide/subpackage/notebookname.ipynb Note If you are adding a new document, please also include it in the pages section in the mlxtend/docs/mkdocs.yml file. 3. Building static HTML files of the documentation First, please check the documenation via localhost (http://127.0.0.1:8000/): ~/github/mlxtend/docs$ mkdocs serve Next, build the static HTML files of the mlxtend documentation via ~/github/mlxtend/docs$ mkdocs build --clean To deploy the documentation, execute ~/github/mlxtend/docs$ mkdocs gh-deploy --clean 4. Generate a PDF of the documentation To generate a PDF version of the documentation, simply cd into the mlxtend/docs directory and execute: python md2pdf.py Uploading a new version to PyPI 1. Creating a new testing environment Assuming we are using conda , create a new python environment via $ conda create -n 'mlxtend-testing' python=3 numpy scipy pandas Next, activate the environment by executing $ source activate mlxtend-testing 2. Installing the package from local files Test the installation by executing $ python setup.py install --record files.txt the --record files.txt flag will create a files.txt file listing the locations where these files will be installed. Try to import the package to see if it works, for example, by executing $ python -c 'import mlxtend; print(mlxtend.__file__)' If everything seems to be fine, remove the installation via $ cat files.txt | xargs rm -rf ; rm files.txt Next, test if pip is able to install the packages. First, navigate to a different directory, and from there, install the package: $ pip install mlxtend and uninstall it again $ pip uninstall mlxtend 3. Deploying the package Consider deploying the package to the PyPI test server first. The setup instructions can be found here . $ python setup.py sdist bdist_wheel upload -r https://testpypi.python.org/pypi Test if it can be installed from there by executing $ pip install -i https://testpypi.python.org/pypi mlxtend and uninstall it $ pip uninstall mlxtend After this dry-run succeeded, repeat this process using the \"real\" PyPI: $ python setup.py sdist bdist_wheel upload 4. Removing the virtual environment Finally, to cleanup our local drive, remove the virtual testing environment via $ conda remove --name 'mlxtend-testing' --all 5. Updating the conda-forge recipe Once a new version of mlxtend has been uploaded to PyPI, update the conda-forge build recipe at https://github.com/conda-forge/mlxtend-feedstock by changing the version number in the recipe/meta.yaml file appropriately.","title":"How To Contribute"},{"location":"CONTRIBUTING/#how-to-contribute","text":"I would be very happy about any kind of contributions that help to improve and extend the functionality of mlxtend.","title":"How to Contribute"},{"location":"CONTRIBUTING/#quick-contributor-checklist","text":"This is a quick checklist about the different steps of a typical contribution to mlxtend (and other open source projects). Consider copying this list to a local text file (or the issue tracker) and checking off items as you go. [ ] Open a new \"issue\" on GitHub to discuss the new feature / bug fix [ ] Fork the mlxtend repository from GitHub (if not already done earlier) [ ] Create and check out a new topic branch (please don't make modifications in the master branch) [ ] Implement the new feature or apply the bug-fix [ ] Add appropriate unit test functions in mlxtend/*/tests [ ] Run nosetests ./mlxtend -sv and make sure that all unit tests pass [ ] Check/improve the test coverage by running nosetests ./mlxtend --with-coverage [ ] Check for style issues by running flake8 ./mlxtend (you may want to run nosetests again after you made modifications to the code) [ ] Add a note about the modification/contribution to the ./docs/sources/changelog.md file [ ] Modify documentation in the appropriate location under mlxtend/docs/sources/ [ ] Push the topic branch to the server and create a pull request [ ] Check the Travis-CI build passed at https://travis-ci.org/rasbt/mlxtend [ ] Check/improve the unit test coverage at https://coveralls.io/github/rasbt/mlxtend [ ] Check/improve the code health at https://landscape.io/github/rasbt/mlxtend","title":"Quick Contributor Checklist"},{"location":"CONTRIBUTING/#tips-for-contributors","text":"","title":"Tips for Contributors"},{"location":"CONTRIBUTING/#getting-started-creating-a-new-issue-and-forking-the-repository","text":"If you don't have a GitHub account, yet, please create one to contribute to this project. Please submit a ticket for your issue to discuss the fix or new feature before too much time and effort is spent for the implementation. Fork the mlxtend repository from the GitHub web interface. Clone the mlxtend repository to your local machine by executing git clone https://github.com//mlxtend.git","title":"Getting Started - Creating a New Issue and Forking the Repository"},{"location":"CONTRIBUTING/#syncing-an-existing-fork","text":"If you already forked mlxtend earlier, you can bring you \"Fork\" up to date with the master branch as follows:","title":"Syncing an Existing Fork"},{"location":"CONTRIBUTING/#1-configuring-a-remote-that-points-to-the-upstream-repository-on-github","text":"List the current configured remote repository of your fork by executing $ git remote -v If you see something like origin https://github.com//mlxtend.git (fetch) origin https://github.com//mlxtend.git (push) you need to specify a new remote upstream repository via $ git remote add upstream https://github.com/rasbt/mlxtend.git Now, verify the new upstream repository you've specified for your fork by executing $ git remote -v You should see following output if everything is configured correctly: origin https://github.com//mlxtend.git (fetch) origin https://github.com//mlxtend.git (push) upstream https://github.com/rasbt/mlxtend.git (fetch) upstream https://github.com/rasbt/mlxtend.git (push)","title":"1. Configuring a remote that points to the upstream repository on GitHub"},{"location":"CONTRIBUTING/#2-syncing-your-fork","text":"First, fetch the updates of the original project's master branch by executing: $ git fetch upstream You should see the following output remote: Counting objects: xx, done. remote: Compressing objects: 100% (xx/xx), done. remote: Total xx (delta xx), reused xx (delta x) Unpacking objects: 100% (xx/xx), done. From https://github.com/rasbt/mlxtend * [new branch] master -> upstream/master This means that the commits to the rasbt/mlxtend master branch are now stored in the local branch upstream/master . If you are not already on your local project's master branch, execute $ git checkout master Finally, merge the changes in upstream/master to your local master branch by executing $ git merge upstream/master which will give you an output that looks similar to Updating xxx...xxx Fast-forward SOME FILE1 | 12 +++++++ SOME FILE2 | 10 +++++++ 2 files changed, 22 insertions(+),","title":"2. Syncing your Fork"},{"location":"CONTRIBUTING/#the-main-workflow-making-changes-in-a-new-topic-branch","text":"Listed below are the 9 typical steps of a contribution.","title":"*The Main Workflow - Making Changes in a New Topic Branch"},{"location":"CONTRIBUTING/#1-discussing-the-feature-or-modification","text":"Before you start coding, please discuss the new feature, bugfix, or other modification to the project on the project's issue tracker . Before you open a \"new issue,\" please do a quick search to see if a similar issue has been submitted already.","title":"1. Discussing the Feature or Modification"},{"location":"CONTRIBUTING/#2-creating-a-new-feature-branch","text":"Please avoid working directly on the master branch but create a new feature branch: $ git branch Switch to the new feature branch by executing $ git checkout ","title":"2. Creating a new feature branch"},{"location":"CONTRIBUTING/#3-developing-the-new-feature-bug-fix","text":"Now it's time to modify existing code or to contribute new code to the project.","title":"3. Developing the new feature / bug fix"},{"location":"CONTRIBUTING/#4-testing-your-code","text":"Add the respective unit tests and check if they pass: $ nosetests -sv Use the --with-coverage flag to ensure that all code is being covered in the unit tests: $ nosetests --with-coverage","title":"4. Testing your code"},{"location":"CONTRIBUTING/#5-documenting-changes","text":"Please add an entry to the mlxtend/docs/sources/changelog.md file. If it is a new feature, it would also be nice if you could update the documentation in appropriate location in mlxtend/sources .","title":"5. Documenting changes"},{"location":"CONTRIBUTING/#6-committing-changes","text":"When you are ready to commit the changes, please provide a meaningful commit message: $ git add # or `git add .` $ git commit -m ''","title":"6. Committing changes"},{"location":"CONTRIBUTING/#7-optional-squashing-commits","text":"If you made multiple smaller commits, it would be nice if you could group them into a larger, summarizing commit. First, list your recent commit via Note Due to the improved GitHub UI, this is no longer necessary/encouraged. $ git log which will list the commits from newest to oldest in the following format by default: commit 046e3af8a9127df8eac879454f029937c8a31c41 Author: rasbt Date: Tue Nov 24 03:46:37 2015 -0500 fixed setup.py commit c3c00f6ba0e8f48bbe1c9081b8ae3817e57ecc5c Author: rasbt Date: Tue Nov 24 03:04:39 2015 -0500 documented feature x commit d87934fe8726c46f0b166d6290a3bf38915d6e75 Author: rasbt Date: Tue Nov 24 02:44:45 2015 -0500 added support for feature x Assuming that it would make sense to group these 3 commits into one, we can execute $ git rebase -i HEAD~3 which will bring our default git editor with the following contents: pick d87934f added support for feature x pick c3c00f6 documented feature x pick 046e3af fixed setup.py Since c3c00f6 and 046e3af are related to the original commit of feature x , let's keep the d87934f and squash the 2 following commits into this initial one by changes the lines to pick d87934f added support for feature x squash c3c00f6 documented feature x squash 046e3af fixed setup.py Now, save the changes in your editor. Now, quitting the editor will apply the rebase changes, and the editor will open a second time, prompting you to enter a new commit message. In this case, we could enter support for feature x to summarize the contributions.","title":"7. Optional: squashing commits"},{"location":"CONTRIBUTING/#8-uploading-changes","text":"Push your changes to a topic branch to the git server by executing: $ git push origin ","title":"8. Uploading changes"},{"location":"CONTRIBUTING/#9-submitting-a-pull-request","text":"Go to your GitHub repository online, select the new feature branch, and submit a new pull request:","title":"9. Submitting a pull request"},{"location":"CONTRIBUTING/#notes-for-developers","text":"","title":"Notes for Developers"},{"location":"CONTRIBUTING/#building-the-documentation","text":"The documentation is built via MkDocs ; to ensure that the documentation is rendered correctly, you can view the documentation locally by executing mkdocs serve from the mlxtend/docs directory. For example, ~/github/mlxtend/docs$ mkdocs serve","title":"Building the documentation"},{"location":"CONTRIBUTING/#1-building-the-api-documentation","text":"To build the API documentation, navigate to mlxtend/docs and execute the make_api.py file from this directory via ~/github/mlxtend/docs$ python make_api.py This should place the API documentation into the correct directories into the two directories: mlxtend/docs/sources/api_modules mlxtend/docs/sources/api_subpackes","title":"1. Building the API documentation"},{"location":"CONTRIBUTING/#2-editing-the-user-guide","text":"The documents containing code examples for the \"User Guide\" are generated from IPython Notebook files. In order to convert a IPython notebook file to markdown after editing, please follow the following steps: Modify or edit the existing notebook. Execute all cells in the current notebook and make sure that no errors occur. Convert the notebook to markdown using the ipynb2markdown.py converter ~/github/mlxtend/docs$ python ipynb2markdown.py --ipynb_path ./sources/user_guide/subpackage/notebookname.ipynb Note If you are adding a new document, please also include it in the pages section in the mlxtend/docs/mkdocs.yml file.","title":"2. Editing the User Guide"},{"location":"CONTRIBUTING/#3-building-static-html-files-of-the-documentation","text":"First, please check the documenation via localhost (http://127.0.0.1:8000/): ~/github/mlxtend/docs$ mkdocs serve Next, build the static HTML files of the mlxtend documentation via ~/github/mlxtend/docs$ mkdocs build --clean To deploy the documentation, execute ~/github/mlxtend/docs$ mkdocs gh-deploy --clean","title":"3. Building static HTML files of the documentation"},{"location":"CONTRIBUTING/#4-generate-a-pdf-of-the-documentation","text":"To generate a PDF version of the documentation, simply cd into the mlxtend/docs directory and execute: python md2pdf.py","title":"4. Generate a PDF of the documentation"},{"location":"CONTRIBUTING/#uploading-a-new-version-to-pypi","text":"","title":"Uploading a new version to PyPI"},{"location":"CONTRIBUTING/#1-creating-a-new-testing-environment","text":"Assuming we are using conda , create a new python environment via $ conda create -n 'mlxtend-testing' python=3 numpy scipy pandas Next, activate the environment by executing $ source activate mlxtend-testing","title":"1. Creating a new testing environment"},{"location":"CONTRIBUTING/#2-installing-the-package-from-local-files","text":"Test the installation by executing $ python setup.py install --record files.txt the --record files.txt flag will create a files.txt file listing the locations where these files will be installed. Try to import the package to see if it works, for example, by executing $ python -c 'import mlxtend; print(mlxtend.__file__)' If everything seems to be fine, remove the installation via $ cat files.txt | xargs rm -rf ; rm files.txt Next, test if pip is able to install the packages. First, navigate to a different directory, and from there, install the package: $ pip install mlxtend and uninstall it again $ pip uninstall mlxtend","title":"2. Installing the package from local files"},{"location":"CONTRIBUTING/#3-deploying-the-package","text":"Consider deploying the package to the PyPI test server first. The setup instructions can be found here . $ python setup.py sdist bdist_wheel upload -r https://testpypi.python.org/pypi Test if it can be installed from there by executing $ pip install -i https://testpypi.python.org/pypi mlxtend and uninstall it $ pip uninstall mlxtend After this dry-run succeeded, repeat this process using the \"real\" PyPI: $ python setup.py sdist bdist_wheel upload","title":"3. Deploying the package"},{"location":"CONTRIBUTING/#4-removing-the-virtual-environment","text":"Finally, to cleanup our local drive, remove the virtual testing environment via $ conda remove --name 'mlxtend-testing' --all","title":"4. Removing the virtual environment"},{"location":"CONTRIBUTING/#5-updating-the-conda-forge-recipe","text":"Once a new version of mlxtend has been uploaded to PyPI, update the conda-forge build recipe at https://github.com/conda-forge/mlxtend-feedstock by changing the version number in the recipe/meta.yaml file appropriately.","title":"5. Updating the conda-forge recipe"},{"location":"USER_GUIDE_INDEX/","text":"User Guide Index classifier Adaline EnsembleVoteClassifier LogisticRegression MultiLayerPerceptron Perceptron SoftmaxRegression StackingClassifier StackingCVClassifier cluster Kmeans data autompg_data boston_housing_data iris_data loadlocal_mnist make_multiplexer_dataset mnist_data three_blobs_data wine_data evaluate bootstrap bootstrap_point632_score BootstrapOutOfBag cochrans_q confusion_matrix combined_ftest_5x2cv feature_importance_permutation ftest lift_score mcnemar_table mcnemar_tables mcnemar paired_ttest_5x2cv paired_ttest_kfold_cv paired_ttest_resampled permutation_test PredefinedHoldoutSplit proportion_difference RandomHoldoutSplit scoring feature_extraction LinearDiscriminantAnalysis PrincipalComponentAnalysis RBFKernelPCA feature_selection ColumnSelector ExhaustiveFeatureSelector SequentialFeatureSelector file_io find_filegroups find_files frequent_patterns apriori association_rules general concepts activation-functions gradient-optimization linear-gradient-derivative regularization-linear image extract_face_landmarks math num_combinations num_permutations plotting category_scatter checkerboard_plot ecdf enrichment_plot plot_confusion_matrix plot_decision_regions plot_learning_curves plot_linear_regression plot_sequential_feature_selection scatterplotmatrix stacked_barplot preprocessing CopyTransformer DenseTransformer MeanCenterer minmax_scaling one-hot_encoding shuffle_arrays_unison standardize TransactionEncoder regressor LinearRegression StackingCVRegressor StackingRegressor text generalize_names generalize_names_duplcheck tokenizer utils Counter","title":"User Guide Index"},{"location":"USER_GUIDE_INDEX/#user-guide-index","text":"","title":"User Guide Index"},{"location":"USER_GUIDE_INDEX/#classifier","text":"Adaline EnsembleVoteClassifier LogisticRegression MultiLayerPerceptron Perceptron SoftmaxRegression StackingClassifier StackingCVClassifier","title":"classifier"},{"location":"USER_GUIDE_INDEX/#cluster","text":"Kmeans","title":"cluster"},{"location":"USER_GUIDE_INDEX/#data","text":"autompg_data boston_housing_data iris_data loadlocal_mnist make_multiplexer_dataset mnist_data three_blobs_data wine_data","title":"data"},{"location":"USER_GUIDE_INDEX/#evaluate","text":"bootstrap bootstrap_point632_score BootstrapOutOfBag cochrans_q confusion_matrix combined_ftest_5x2cv feature_importance_permutation ftest lift_score mcnemar_table mcnemar_tables mcnemar paired_ttest_5x2cv paired_ttest_kfold_cv paired_ttest_resampled permutation_test PredefinedHoldoutSplit proportion_difference RandomHoldoutSplit scoring","title":"evaluate"},{"location":"USER_GUIDE_INDEX/#feature_extraction","text":"LinearDiscriminantAnalysis PrincipalComponentAnalysis RBFKernelPCA","title":"feature_extraction"},{"location":"USER_GUIDE_INDEX/#feature_selection","text":"ColumnSelector ExhaustiveFeatureSelector SequentialFeatureSelector","title":"feature_selection"},{"location":"USER_GUIDE_INDEX/#file_io","text":"find_filegroups find_files","title":"file_io"},{"location":"USER_GUIDE_INDEX/#frequent_patterns","text":"apriori association_rules","title":"frequent_patterns"},{"location":"USER_GUIDE_INDEX/#general-concepts","text":"activation-functions gradient-optimization linear-gradient-derivative regularization-linear","title":"general concepts"},{"location":"USER_GUIDE_INDEX/#image","text":"extract_face_landmarks","title":"image"},{"location":"USER_GUIDE_INDEX/#math","text":"num_combinations num_permutations","title":"math"},{"location":"USER_GUIDE_INDEX/#plotting","text":"category_scatter checkerboard_plot ecdf enrichment_plot plot_confusion_matrix plot_decision_regions plot_learning_curves plot_linear_regression plot_sequential_feature_selection scatterplotmatrix stacked_barplot","title":"plotting"},{"location":"USER_GUIDE_INDEX/#preprocessing","text":"CopyTransformer DenseTransformer MeanCenterer minmax_scaling one-hot_encoding shuffle_arrays_unison standardize TransactionEncoder","title":"preprocessing"},{"location":"USER_GUIDE_INDEX/#regressor","text":"LinearRegression StackingCVRegressor StackingRegressor","title":"regressor"},{"location":"USER_GUIDE_INDEX/#text","text":"generalize_names generalize_names_duplcheck tokenizer","title":"text"},{"location":"USER_GUIDE_INDEX/#utils","text":"Counter","title":"utils"},{"location":"cite/","text":"Citing mlxtend If you use mlxtend as part of your workflow in a scientific publication, please consider citing the mlxtend repository with the following DOI: Raschka, Sebastian (2018) MLxtend: Providing machine learning and data science utilities and extensions to Python's scientific computing stack . J Open Source Softw 3(24). @article{raschkas_2018_mlxtend, author = {Sebastian Raschka}, title = {MLxtend: Providing machine learning and data science utilities and extensions to Python\u2019s scientific computing stack}, journal = {The Journal of Open Source Software}, volume = {3}, number = {24}, month = apr, year = 2018, publisher = {The Open Journal}, doi = {10.21105/joss.00638}, url = {http://joss.theoj.org/papers/10.21105/joss.00638} }","title":"Citing Mlxtend"},{"location":"cite/#citing-mlxtend","text":"If you use mlxtend as part of your workflow in a scientific publication, please consider citing the mlxtend repository with the following DOI: Raschka, Sebastian (2018) MLxtend: Providing machine learning and data science utilities and extensions to Python's scientific computing stack . J Open Source Softw 3(24). @article{raschkas_2018_mlxtend, author = {Sebastian Raschka}, title = {MLxtend: Providing machine learning and data science utilities and extensions to Python\u2019s scientific computing stack}, journal = {The Journal of Open Source Software}, volume = {3}, number = {24}, month = apr, year = 2018, publisher = {The Open Journal}, doi = {10.21105/joss.00638}, url = {http://joss.theoj.org/papers/10.21105/joss.00638} }","title":"Citing mlxtend"},{"location":"contributors/","text":"Contributors For the current list of contributors to mlxtend, please see the GitHub contributor page at https://github.com/rasbt/mlxtend/graphs/contributors .","title":"Contributors"},{"location":"contributors/#contributors","text":"For the current list of contributors to mlxtend, please see the GitHub contributor page at https://github.com/rasbt/mlxtend/graphs/contributors .","title":"Contributors"},{"location":"discuss/","text":"Discuss Any questions or comments about mlxtend? Join the mlxtend mailing list on Google Groups!","title":"Discuss"},{"location":"discuss/#discuss","text":"Any questions or comments about mlxtend? Join the mlxtend mailing list on Google Groups!","title":"Discuss"},{"location":"installation/","text":"Installing mlxtend PyPI To install mlxtend, just execute pip install mlxtend Alternatively, you download the package manually from the Python Package Index https://pypi.python.org/pypi/mlxtend , unzip it, navigate into the package, and use the command: python setup.py install Upgrading via pip To upgrade an existing version of mlxtend from PyPI, execute pip install mlxtend --upgrade --no-deps Please note that the dependencies (NumPy and SciPy) will also be upgraded if you omit the --no-deps flag; use the --no-deps (\"no dependencies\") flag if you don't want this. Installing mlxtend from the source distribution In rare cases, users reported problems on certain systems with the default pip installation command, which installs mlxtend from the binary distribution (\"wheels\") on PyPI. If you should encounter similar problems, you could try to install mlxtend from the source distribution instead via pip install --no-binary :all: mlxtend Also, I would appreciate it if you could report any issues that occur when using pip install mlxtend in hope that we can fix these in future releases. Conda The mlxtend package is also available through conda forge . To install mlxtend using conda, use the following command: conda install mlxtend --channel conda-forge or simply conda install mlxtend if you added conda-forge to your channels ( conda config --add channels conda-forge ). Dev Version The mlxtend version on PyPI may always one step behind; you can install the latest development version from the GitHub repository by executing pip install git+git://github.com/rasbt/mlxtend.git Or, you can fork the GitHub repository from https://github.com/rasbt/mlxtend and install mlxtend from your local drive via python setup.py install","title":"Installation"},{"location":"installation/#installing-mlxtend","text":"","title":"Installing mlxtend"},{"location":"installation/#pypi","text":"To install mlxtend, just execute pip install mlxtend Alternatively, you download the package manually from the Python Package Index https://pypi.python.org/pypi/mlxtend , unzip it, navigate into the package, and use the command: python setup.py install","title":"PyPI"},{"location":"installation/#upgrading-via-pip","text":"To upgrade an existing version of mlxtend from PyPI, execute pip install mlxtend --upgrade --no-deps Please note that the dependencies (NumPy and SciPy) will also be upgraded if you omit the --no-deps flag; use the --no-deps (\"no dependencies\") flag if you don't want this.","title":"Upgrading via pip"},{"location":"installation/#installing-mlxtend-from-the-source-distribution","text":"In rare cases, users reported problems on certain systems with the default pip installation command, which installs mlxtend from the binary distribution (\"wheels\") on PyPI. If you should encounter similar problems, you could try to install mlxtend from the source distribution instead via pip install --no-binary :all: mlxtend Also, I would appreciate it if you could report any issues that occur when using pip install mlxtend in hope that we can fix these in future releases.","title":"Installing mlxtend from the source distribution"},{"location":"installation/#conda","text":"The mlxtend package is also available through conda forge . To install mlxtend using conda, use the following command: conda install mlxtend --channel conda-forge or simply conda install mlxtend if you added conda-forge to your channels ( conda config --add channels conda-forge ).","title":"Conda"},{"location":"installation/#dev-version","text":"The mlxtend version on PyPI may always one step behind; you can install the latest development version from the GitHub repository by executing pip install git+git://github.com/rasbt/mlxtend.git Or, you can fork the GitHub repository from https://github.com/rasbt/mlxtend and install mlxtend from your local drive via python setup.py install","title":"Dev Version"},{"location":"license/","text":"This project is released under a permissive new BSD open source license and commercially usable. There is no warranty; not even for merchantability or fitness for a particular purpose. In addition, you may use, copy, modify, and redistribute all artistic creative works (figures and images) included in this distribution under the directory according to the terms and conditions of the Creative Commons Attribution 4.0 International License. (Computer-generated graphics such as the plots produced by matplotlib fall under the BSD license mentioned above). new BSD License New BSD License Copyright (c) 2014-2018, Sebastian Raschka. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. Neither the name of mlxtend nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS \"AS IS\" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. Creative Commons Attribution 4.0 International License mlxtend documentation figures are licensed under a Creative Commons Attribution 4.0 International License. http://creativecommons.org/licenses/by-sa/4.0/ . You are free to: Share \u2014 copy and redistribute the material in any medium or format Adapt \u2014 remix, transform, and build upon the material for any purpose, even commercially. The licensor cannot revoke these freedoms as long as you follow the license terms. Under the following terms: Attribution \u2014 You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use. No additional restrictions \u2014 You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.","title":"License"},{"location":"license/#new-bsd-license","text":"New BSD License Copyright (c) 2014-2018, Sebastian Raschka. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. Neither the name of mlxtend nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS \"AS IS\" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.","title":"new BSD License"},{"location":"license/#creative-commons-attribution-40-international-license","text":"mlxtend documentation figures are licensed under a Creative Commons Attribution 4.0 International License. http://creativecommons.org/licenses/by-sa/4.0/ .","title":"Creative Commons Attribution 4.0 International License"},{"location":"license/#you-are-free-to","text":"Share \u2014 copy and redistribute the material in any medium or format Adapt \u2014 remix, transform, and build upon the material for any purpose, even commercially. The licensor cannot revoke these freedoms as long as you follow the license terms.","title":"You are free to:"},{"location":"license/#under-the-following-terms","text":"Attribution \u2014 You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use. No additional restrictions \u2014 You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.","title":"Under the following terms:"},{"location":"api_modules/mlxtend.classifier/Adaline/","text":"Adaline Adaline(eta=0.01, epochs=50, minibatches=None, random_seed=None, print_progress=0) ADAptive LInear NEuron classifier. Note that this implementation of Adaline expects binary class labels in {0, 1}. Parameters eta : float (default: 0.01) solver rate (between 0.0 and 1.0) epochs : int (default: 50) Passes over the training dataset. Prior to each epoch, the dataset is shuffled if minibatches > 1 to prevent cycles in stochastic gradient descent. minibatches : int (default: None) The number of minibatches for gradient-based optimization. If None: Normal Equations (closed-form solution) If 1: Gradient Descent learning If len(y): Stochastic Gradient Descent (SGD) online learning If 1 < minibatches < len(y): SGD Minibatch learning random_seed : int (default: None) Set random state for shuffling and initializing the weights. print_progress : int (default: 0) Prints progress in fitting to stderr if not solver='normal equation' 0: No output 1: Epochs elapsed and cost 2: 1 plus time elapsed 3: 2 plus estimated time until completion Attributes w_ : 2d-array, shape={n_features, 1} Model weights after fitting. b_ : 1d-array, shape={1,} Bias unit after fitting. cost_ : list Sum of squared errors after each epoch. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/classifier/Adaline/ Methods fit(X, y, init_params=True) Learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. init_params : bool (default: True) Re-initializes model parameters prior to fitting. Set False to continue training with weights from a previous model fitting. Returns self : object get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values.' adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py Author: Gael Varoquaux gael.varoquaux@normalesup.org License: BSD 3 clause predict(X) Predict targets from X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns target_values : array-like, shape = [n_samples] Predicted target values. score(X, y) Compute the prediction accuracy Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values (true class labels). Returns acc : float The prediction accuracy as a float between 0.0 and 1.0 (perfect score). set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py Author: Gael Varoquaux gael.varoquaux@normalesup.org License: BSD 3 clause","title":"Adaline"},{"location":"api_modules/mlxtend.classifier/Adaline/#adaline","text":"Adaline(eta=0.01, epochs=50, minibatches=None, random_seed=None, print_progress=0) ADAptive LInear NEuron classifier. Note that this implementation of Adaline expects binary class labels in {0, 1}. Parameters eta : float (default: 0.01) solver rate (between 0.0 and 1.0) epochs : int (default: 50) Passes over the training dataset. Prior to each epoch, the dataset is shuffled if minibatches > 1 to prevent cycles in stochastic gradient descent. minibatches : int (default: None) The number of minibatches for gradient-based optimization. If None: Normal Equations (closed-form solution) If 1: Gradient Descent learning If len(y): Stochastic Gradient Descent (SGD) online learning If 1 < minibatches < len(y): SGD Minibatch learning random_seed : int (default: None) Set random state for shuffling and initializing the weights. print_progress : int (default: 0) Prints progress in fitting to stderr if not solver='normal equation' 0: No output 1: Epochs elapsed and cost 2: 1 plus time elapsed 3: 2 plus estimated time until completion Attributes w_ : 2d-array, shape={n_features, 1} Model weights after fitting. b_ : 1d-array, shape={1,} Bias unit after fitting. cost_ : list Sum of squared errors after each epoch. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/classifier/Adaline/","title":"Adaline"},{"location":"api_modules/mlxtend.classifier/Adaline/#methods","text":"fit(X, y, init_params=True) Learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. init_params : bool (default: True) Re-initializes model parameters prior to fitting. Set False to continue training with weights from a previous model fitting. Returns self : object get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values.' adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py","title":"Methods"},{"location":"api_modules/mlxtend.classifier/Adaline/#author-gael-varoquaux-amp103amp97amp101amp108amp46amp118amp97amp114amp111amp113amp117amp97amp117amp120amp64amp110amp111amp114amp109amp97amp108amp101amp115amp117amp112amp46amp111amp114amp103","text":"","title":"Author: Gael Varoquaux gael.varoquaux@normalesup.org"},{"location":"api_modules/mlxtend.classifier/Adaline/#license-bsd-3-clause","text":"predict(X) Predict targets from X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns target_values : array-like, shape = [n_samples] Predicted target values. score(X, y) Compute the prediction accuracy Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values (true class labels). Returns acc : float The prediction accuracy as a float between 0.0 and 1.0 (perfect score). set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py","title":"License: BSD 3 clause"},{"location":"api_modules/mlxtend.classifier/Adaline/#author-gael-varoquaux-amp103amp97amp101amp108amp46amp118amp97amp114amp111amp113amp117amp97amp117amp120amp64amp110amp111amp114amp109amp97amp108amp101amp115amp117amp112amp46amp111amp114amp103_1","text":"","title":"Author: Gael Varoquaux gael.varoquaux@normalesup.org"},{"location":"api_modules/mlxtend.classifier/Adaline/#license-bsd-3-clause_1","text":"","title":"License: BSD 3 clause"},{"location":"api_modules/mlxtend.classifier/EnsembleVoteClassifier/","text":"EnsembleVoteClassifier EnsembleVoteClassifier(clfs, voting='hard', weights=None, verbose=0, refit=True) Soft Voting/Majority Rule classifier for scikit-learn estimators. Parameters clfs : array-like, shape = [n_classifiers] A list of classifiers. Invoking the fit method on the VotingClassifier will fit clones of those original classifiers that will be stored in the class attribute self.clfs_ if refit=True (default). voting : str, {'hard', 'soft'} (default='hard') If 'hard', uses predicted class labels for majority rule voting. Else if 'soft', predicts the class label based on the argmax of the sums of the predicted probalities, which is recommended for an ensemble of well-calibrated classifiers. weights : array-like, shape = [n_classifiers], optional (default= None ) Sequence of weights ( float or int ) to weight the occurances of predicted class labels ( hard voting) or class probabilities before averaging ( soft voting). Uses uniform weights if None . verbose : int, optional (default=0) Controls the verbosity of the building process. - verbose=0 (default): Prints nothing - verbose=1 : Prints the number & name of the clf being fitted - verbose=2 : Prints info about the parameters of the clf being fitted - verbose>2 : Changes verbose param of the underlying clf to self.verbose - 2 refit : bool (default: True) Refits classifiers in clfs if True; uses references to the clfs , otherwise (assumes that the classifiers were already fit). Note: refit=False is incompatible to mist scikit-learn wrappers! For instance, if any form of cross-validation is performed this would require the re-fitting classifiers to training folds, which would raise a NotFitterError if refit=False. (New in mlxtend v0.6.) Attributes classes_ : array-like, shape = [n_predictions] clf : array-like, shape = [n_predictions] The unmodified input classifiers clf_ : array-like, shape = [n_predictions] Fitted clones of the input classifiers Examples >>> import numpy as np >>> from sklearn.linear_model import LogisticRegression >>> from sklearn.naive_bayes import GaussianNB >>> from sklearn.ensemble import RandomForestClassifier >>> from mlxtend.sklearn import EnsembleVoteClassifier >>> clf1 = LogisticRegression(random_seed=1) >>> clf2 = RandomForestClassifier(random_seed=1) >>> clf3 = GaussianNB() >>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]]) >>> y = np.array([1, 1, 1, 2, 2, 2]) >>> eclf1 = EnsembleVoteClassifier(clfs=[clf1, clf2, clf3], ... voting='hard', verbose=1) >>> eclf1 = eclf1.fit(X, y) >>> print(eclf1.predict(X)) [1 1 1 2 2 2] >>> eclf2 = EnsembleVoteClassifier(clfs=[clf1, clf2, clf3], voting='soft') >>> eclf2 = eclf2.fit(X, y) >>> print(eclf2.predict(X)) [1 1 1 2 2 2] >>> eclf3 = EnsembleVoteClassifier(clfs=[clf1, clf2, clf3], ... voting='soft', weights=[2,1,1]) >>> eclf3 = eclf3.fit(X, y) >>> print(eclf3.predict(X)) [1 1 1 2 2 2] >>> For more usage examples, please see http://rasbt.github.io/mlxtend/user_guide/classifier/EnsembleVoteClassifier/ Methods fit(X, y, sample_weight=None) Learn weight coefficients from training data for each classifier. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. sample_weight : array-like, shape = [n_samples], optional Sample weights passed as sample_weights to each regressor in the regressors list as well as the meta_regressor. Raises error if some regressor does not support sample_weight in the fit() method. Returns self : object fit_transform(X, y=None, fit_params) Fit to data, then transform it. Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X. Parameters X : numpy array of shape [n_samples, n_features] Training set. y : numpy array of shape [n_samples] Target values. Returns X_new : numpy array of shape [n_samples, n_features_new] Transformed array. get_params(deep=True) Return estimator parameter names for GridSearch support. predict(X) Predict class labels for X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns maj : array-like, shape = [n_samples] Predicted class labels. predict_proba(X) Predict class probabilities for X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns avg : array-like, shape = [n_samples, n_classes] Weighted average probability for each class per sample. score(X, y, sample_weight=None) Returns the mean accuracy on the given test data and labels. In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted. Parameters X : array-like, shape = (n_samples, n_features) Test samples. y : array-like, shape = (n_samples) or (n_samples, n_outputs) True labels for X. sample_weight : array-like, shape = [n_samples], optional Sample weights. Returns score : float Mean accuracy of self.predict(X) wrt. y. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self transform(X) Return class labels or probabilities for X for each estimator. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns If voting='soft'`` : array-like = [n_classifiers, n_samples, n_classes] Class probabilties calculated by each classifier. If voting='hard'`` : array-like = [n_classifiers, n_samples] Class labels predicted by each classifier.","title":"EnsembleVoteClassifier"},{"location":"api_modules/mlxtend.classifier/EnsembleVoteClassifier/#ensemblevoteclassifier","text":"EnsembleVoteClassifier(clfs, voting='hard', weights=None, verbose=0, refit=True) Soft Voting/Majority Rule classifier for scikit-learn estimators. Parameters clfs : array-like, shape = [n_classifiers] A list of classifiers. Invoking the fit method on the VotingClassifier will fit clones of those original classifiers that will be stored in the class attribute self.clfs_ if refit=True (default). voting : str, {'hard', 'soft'} (default='hard') If 'hard', uses predicted class labels for majority rule voting. Else if 'soft', predicts the class label based on the argmax of the sums of the predicted probalities, which is recommended for an ensemble of well-calibrated classifiers. weights : array-like, shape = [n_classifiers], optional (default= None ) Sequence of weights ( float or int ) to weight the occurances of predicted class labels ( hard voting) or class probabilities before averaging ( soft voting). Uses uniform weights if None . verbose : int, optional (default=0) Controls the verbosity of the building process. - verbose=0 (default): Prints nothing - verbose=1 : Prints the number & name of the clf being fitted - verbose=2 : Prints info about the parameters of the clf being fitted - verbose>2 : Changes verbose param of the underlying clf to self.verbose - 2 refit : bool (default: True) Refits classifiers in clfs if True; uses references to the clfs , otherwise (assumes that the classifiers were already fit). Note: refit=False is incompatible to mist scikit-learn wrappers! For instance, if any form of cross-validation is performed this would require the re-fitting classifiers to training folds, which would raise a NotFitterError if refit=False. (New in mlxtend v0.6.) Attributes classes_ : array-like, shape = [n_predictions] clf : array-like, shape = [n_predictions] The unmodified input classifiers clf_ : array-like, shape = [n_predictions] Fitted clones of the input classifiers Examples >>> import numpy as np >>> from sklearn.linear_model import LogisticRegression >>> from sklearn.naive_bayes import GaussianNB >>> from sklearn.ensemble import RandomForestClassifier >>> from mlxtend.sklearn import EnsembleVoteClassifier >>> clf1 = LogisticRegression(random_seed=1) >>> clf2 = RandomForestClassifier(random_seed=1) >>> clf3 = GaussianNB() >>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]]) >>> y = np.array([1, 1, 1, 2, 2, 2]) >>> eclf1 = EnsembleVoteClassifier(clfs=[clf1, clf2, clf3], ... voting='hard', verbose=1) >>> eclf1 = eclf1.fit(X, y) >>> print(eclf1.predict(X)) [1 1 1 2 2 2] >>> eclf2 = EnsembleVoteClassifier(clfs=[clf1, clf2, clf3], voting='soft') >>> eclf2 = eclf2.fit(X, y) >>> print(eclf2.predict(X)) [1 1 1 2 2 2] >>> eclf3 = EnsembleVoteClassifier(clfs=[clf1, clf2, clf3], ... voting='soft', weights=[2,1,1]) >>> eclf3 = eclf3.fit(X, y) >>> print(eclf3.predict(X)) [1 1 1 2 2 2] >>> For more usage examples, please see http://rasbt.github.io/mlxtend/user_guide/classifier/EnsembleVoteClassifier/","title":"EnsembleVoteClassifier"},{"location":"api_modules/mlxtend.classifier/EnsembleVoteClassifier/#methods","text":"fit(X, y, sample_weight=None) Learn weight coefficients from training data for each classifier. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. sample_weight : array-like, shape = [n_samples], optional Sample weights passed as sample_weights to each regressor in the regressors list as well as the meta_regressor. Raises error if some regressor does not support sample_weight in the fit() method. Returns self : object fit_transform(X, y=None, fit_params) Fit to data, then transform it. Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X. Parameters X : numpy array of shape [n_samples, n_features] Training set. y : numpy array of shape [n_samples] Target values. Returns X_new : numpy array of shape [n_samples, n_features_new] Transformed array. get_params(deep=True) Return estimator parameter names for GridSearch support. predict(X) Predict class labels for X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns maj : array-like, shape = [n_samples] Predicted class labels. predict_proba(X) Predict class probabilities for X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns avg : array-like, shape = [n_samples, n_classes] Weighted average probability for each class per sample. score(X, y, sample_weight=None) Returns the mean accuracy on the given test data and labels. In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted. Parameters X : array-like, shape = (n_samples, n_features) Test samples. y : array-like, shape = (n_samples) or (n_samples, n_outputs) True labels for X. sample_weight : array-like, shape = [n_samples], optional Sample weights. Returns score : float Mean accuracy of self.predict(X) wrt. y. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self transform(X) Return class labels or probabilities for X for each estimator. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns If voting='soft'`` : array-like = [n_classifiers, n_samples, n_classes] Class probabilties calculated by each classifier. If voting='hard'`` : array-like = [n_classifiers, n_samples] Class labels predicted by each classifier.","title":"Methods"},{"location":"api_modules/mlxtend.classifier/LogisticRegression/","text":"LogisticRegression LogisticRegression(eta=0.01, epochs=50, l2_lambda=0.0, minibatches=1, random_seed=None, print_progress=0) Logistic regression classifier. Note that this implementation of Logistic Regression expects binary class labels in {0, 1}. Parameters eta : float (default: 0.01) Learning rate (between 0.0 and 1.0) epochs : int (default: 50) Passes over the training dataset. Prior to each epoch, the dataset is shuffled if minibatches > 1 to prevent cycles in stochastic gradient descent. l2_lambda : float Regularization parameter for L2 regularization. No regularization if l2_lambda=0.0. minibatches : int (default: 1) The number of minibatches for gradient-based optimization. If 1: Gradient Descent learning If len(y): Stochastic Gradient Descent (SGD) online learning If 1 < minibatches < len(y): SGD Minibatch learning random_seed : int (default: None) Set random state for shuffling and initializing the weights. print_progress : int (default: 0) Prints progress in fitting to stderr. 0: No output 1: Epochs elapsed and cost 2: 1 plus time elapsed 3: 2 plus estimated time until completion Attributes w_ : 2d-array, shape={n_features, 1} Model weights after fitting. b_ : 1d-array, shape={1,} Bias unit after fitting. cost_ : list List of floats with cross_entropy cost (sgd or gd) for every epoch. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/classifier/LogisticRegression/ Methods fit(X, y, init_params=True) Learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. init_params : bool (default: True) Re-initializes model parameters prior to fitting. Set False to continue training with weights from a previous model fitting. Returns self : object get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values.' adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py Author: Gael Varoquaux gael.varoquaux@normalesup.org License: BSD 3 clause predict(X) Predict targets from X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns target_values : array-like, shape = [n_samples] Predicted target values. predict_proba(X) Predict class probabilities of X from the net input. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns Class 1 probability : float score(X, y) Compute the prediction accuracy Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values (true class labels). Returns acc : float The prediction accuracy as a float between 0.0 and 1.0 (perfect score). set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py Author: Gael Varoquaux gael.varoquaux@normalesup.org License: BSD 3 clause","title":"LogisticRegression"},{"location":"api_modules/mlxtend.classifier/LogisticRegression/#logisticregression","text":"LogisticRegression(eta=0.01, epochs=50, l2_lambda=0.0, minibatches=1, random_seed=None, print_progress=0) Logistic regression classifier. Note that this implementation of Logistic Regression expects binary class labels in {0, 1}. Parameters eta : float (default: 0.01) Learning rate (between 0.0 and 1.0) epochs : int (default: 50) Passes over the training dataset. Prior to each epoch, the dataset is shuffled if minibatches > 1 to prevent cycles in stochastic gradient descent. l2_lambda : float Regularization parameter for L2 regularization. No regularization if l2_lambda=0.0. minibatches : int (default: 1) The number of minibatches for gradient-based optimization. If 1: Gradient Descent learning If len(y): Stochastic Gradient Descent (SGD) online learning If 1 < minibatches < len(y): SGD Minibatch learning random_seed : int (default: None) Set random state for shuffling and initializing the weights. print_progress : int (default: 0) Prints progress in fitting to stderr. 0: No output 1: Epochs elapsed and cost 2: 1 plus time elapsed 3: 2 plus estimated time until completion Attributes w_ : 2d-array, shape={n_features, 1} Model weights after fitting. b_ : 1d-array, shape={1,} Bias unit after fitting. cost_ : list List of floats with cross_entropy cost (sgd or gd) for every epoch. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/classifier/LogisticRegression/","title":"LogisticRegression"},{"location":"api_modules/mlxtend.classifier/LogisticRegression/#methods","text":"fit(X, y, init_params=True) Learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. init_params : bool (default: True) Re-initializes model parameters prior to fitting. Set False to continue training with weights from a previous model fitting. Returns self : object get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values.' adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py","title":"Methods"},{"location":"api_modules/mlxtend.classifier/LogisticRegression/#author-gael-varoquaux-amp103amp97amp101amp108amp46amp118amp97amp114amp111amp113amp117amp97amp117amp120amp64amp110amp111amp114amp109amp97amp108amp101amp115amp117amp112amp46amp111amp114amp103","text":"","title":"Author: Gael Varoquaux gael.varoquaux@normalesup.org"},{"location":"api_modules/mlxtend.classifier/LogisticRegression/#license-bsd-3-clause","text":"predict(X) Predict targets from X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns target_values : array-like, shape = [n_samples] Predicted target values. predict_proba(X) Predict class probabilities of X from the net input. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns Class 1 probability : float score(X, y) Compute the prediction accuracy Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values (true class labels). Returns acc : float The prediction accuracy as a float between 0.0 and 1.0 (perfect score). set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py","title":"License: BSD 3 clause"},{"location":"api_modules/mlxtend.classifier/LogisticRegression/#author-gael-varoquaux-amp103amp97amp101amp108amp46amp118amp97amp114amp111amp113amp117amp97amp117amp120amp64amp110amp111amp114amp109amp97amp108amp101amp115amp117amp112amp46amp111amp114amp103_1","text":"","title":"Author: Gael Varoquaux gael.varoquaux@normalesup.org"},{"location":"api_modules/mlxtend.classifier/LogisticRegression/#license-bsd-3-clause_1","text":"","title":"License: BSD 3 clause"},{"location":"api_modules/mlxtend.classifier/MultiLayerPerceptron/","text":"MultiLayerPerceptron MultiLayerPerceptron(eta=0.5, epochs=50, hidden_layers=[50], n_classes=None, momentum=0.0, l1=0.0, l2=0.0, dropout=1.0, decrease_const=0.0, minibatches=1, random_seed=None, print_progress=0) Multi-layer perceptron classifier with logistic sigmoid activations Parameters eta : float (default: 0.5) Learning rate (between 0.0 and 1.0) epochs : int (default: 50) Passes over the training dataset. Prior to each epoch, the dataset is shuffled if minibatches > 1 to prevent cycles in stochastic gradient descent. hidden_layers : list (default: [50]) Number of units per hidden layer. By default 50 units in the first hidden layer. At the moment only 1 hidden layer is supported n_classes : int (default: None) A positive integer to declare the number of class labels if not all class labels are present in a partial training set. Gets the number of class labels automatically if None. l1 : float (default: 0.0) L1 regularization strength l2 : float (default: 0.0) L2 regularization strength momentum : float (default: 0.0) Momentum constant. Factor multiplied with the gradient of the previous epoch t-1 to improve learning speed w(t) := w(t) - (grad(t) + momentum * grad(t-1)) decrease_const : float (default: 0.0) Decrease constant. Shrinks the learning rate after each epoch via eta / (1 + epoch*decrease_const) minibatches : int (default: 1) Divide the training data into k minibatches for accelerated stochastic gradient descent learning. Gradient Descent Learning if minibatches = 1 Stochastic Gradient Descent learning if minibatches = len(y) Minibatch learning if minibatches > 1 random_seed : int (default: None) Set random state for shuffling and initializing the weights. print_progress : int (default: 0) Prints progress in fitting to stderr. 0: No output 1: Epochs elapsed and cost 2: 1 plus time elapsed 3: 2 plus estimated time until completion Attributes w_ : 2d-array, shape=[n_features, n_classes] Weights after fitting. b_ : 1D-array, shape=[n_classes] Bias units after fitting. cost_ : list List of floats; the mean categorical cross entropy cost after each epoch. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/classifier/MultiLayerPerceptron/ Methods fit(X, y, init_params=True) Learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. init_params : bool (default: True) Re-initializes model parameters prior to fitting. Set False to continue training with weights from a previous model fitting. Returns self : object get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values.' adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py Author: Gael Varoquaux gael.varoquaux@normalesup.org License: BSD 3 clause predict(X) Predict targets from X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns target_values : array-like, shape = [n_samples] Predicted target values. predict_proba(X) Predict class probabilities of X from the net input. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns Class probabilties : array-like, shape= [n_samples, n_classes] score(X, y) Compute the prediction accuracy Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values (true class labels). Returns acc : float The prediction accuracy as a float between 0.0 and 1.0 (perfect score). set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py Author: Gael Varoquaux gael.varoquaux@normalesup.org License: BSD 3 clause","title":"MultiLayerPerceptron"},{"location":"api_modules/mlxtend.classifier/MultiLayerPerceptron/#multilayerperceptron","text":"MultiLayerPerceptron(eta=0.5, epochs=50, hidden_layers=[50], n_classes=None, momentum=0.0, l1=0.0, l2=0.0, dropout=1.0, decrease_const=0.0, minibatches=1, random_seed=None, print_progress=0) Multi-layer perceptron classifier with logistic sigmoid activations Parameters eta : float (default: 0.5) Learning rate (between 0.0 and 1.0) epochs : int (default: 50) Passes over the training dataset. Prior to each epoch, the dataset is shuffled if minibatches > 1 to prevent cycles in stochastic gradient descent. hidden_layers : list (default: [50]) Number of units per hidden layer. By default 50 units in the first hidden layer. At the moment only 1 hidden layer is supported n_classes : int (default: None) A positive integer to declare the number of class labels if not all class labels are present in a partial training set. Gets the number of class labels automatically if None. l1 : float (default: 0.0) L1 regularization strength l2 : float (default: 0.0) L2 regularization strength momentum : float (default: 0.0) Momentum constant. Factor multiplied with the gradient of the previous epoch t-1 to improve learning speed w(t) := w(t) - (grad(t) + momentum * grad(t-1)) decrease_const : float (default: 0.0) Decrease constant. Shrinks the learning rate after each epoch via eta / (1 + epoch*decrease_const) minibatches : int (default: 1) Divide the training data into k minibatches for accelerated stochastic gradient descent learning. Gradient Descent Learning if minibatches = 1 Stochastic Gradient Descent learning if minibatches = len(y) Minibatch learning if minibatches > 1 random_seed : int (default: None) Set random state for shuffling and initializing the weights. print_progress : int (default: 0) Prints progress in fitting to stderr. 0: No output 1: Epochs elapsed and cost 2: 1 plus time elapsed 3: 2 plus estimated time until completion Attributes w_ : 2d-array, shape=[n_features, n_classes] Weights after fitting. b_ : 1D-array, shape=[n_classes] Bias units after fitting. cost_ : list List of floats; the mean categorical cross entropy cost after each epoch. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/classifier/MultiLayerPerceptron/","title":"MultiLayerPerceptron"},{"location":"api_modules/mlxtend.classifier/MultiLayerPerceptron/#methods","text":"fit(X, y, init_params=True) Learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. init_params : bool (default: True) Re-initializes model parameters prior to fitting. Set False to continue training with weights from a previous model fitting. Returns self : object get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values.' adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py","title":"Methods"},{"location":"api_modules/mlxtend.classifier/MultiLayerPerceptron/#author-gael-varoquaux-amp103amp97amp101amp108amp46amp118amp97amp114amp111amp113amp117amp97amp117amp120amp64amp110amp111amp114amp109amp97amp108amp101amp115amp117amp112amp46amp111amp114amp103","text":"","title":"Author: Gael Varoquaux gael.varoquaux@normalesup.org"},{"location":"api_modules/mlxtend.classifier/MultiLayerPerceptron/#license-bsd-3-clause","text":"predict(X) Predict targets from X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns target_values : array-like, shape = [n_samples] Predicted target values. predict_proba(X) Predict class probabilities of X from the net input. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns Class probabilties : array-like, shape= [n_samples, n_classes] score(X, y) Compute the prediction accuracy Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values (true class labels). Returns acc : float The prediction accuracy as a float between 0.0 and 1.0 (perfect score). set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py","title":"License: BSD 3 clause"},{"location":"api_modules/mlxtend.classifier/MultiLayerPerceptron/#author-gael-varoquaux-amp103amp97amp101amp108amp46amp118amp97amp114amp111amp113amp117amp97amp117amp120amp64amp110amp111amp114amp109amp97amp108amp101amp115amp117amp112amp46amp111amp114amp103_1","text":"","title":"Author: Gael Varoquaux gael.varoquaux@normalesup.org"},{"location":"api_modules/mlxtend.classifier/MultiLayerPerceptron/#license-bsd-3-clause_1","text":"","title":"License: BSD 3 clause"},{"location":"api_modules/mlxtend.classifier/Perceptron/","text":"Perceptron Perceptron(eta=0.1, epochs=50, random_seed=None, print_progress=0) Perceptron classifier. Note that this implementation of the Perceptron expects binary class labels in {0, 1}. Parameters eta : float (default: 0.1) Learning rate (between 0.0 and 1.0) epochs : int (default: 50) Number of passes over the training dataset. Prior to each epoch, the dataset is shuffled to prevent cycles. random_seed : int Random state for initializing random weights and shuffling. print_progress : int (default: 0) Prints progress in fitting to stderr. 0: No output 1: Epochs elapsed and cost 2: 1 plus time elapsed 3: 2 plus estimated time until completion Attributes w_ : 2d-array, shape={n_features, 1} Model weights after fitting. b_ : 1d-array, shape={1,} Bias unit after fitting. cost_ : list Number of misclassifications in every epoch. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/classifier/Perceptron/ Methods fit(X, y, init_params=True) Learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. init_params : bool (default: True) Re-initializes model parameters prior to fitting. Set False to continue training with weights from a previous model fitting. Returns self : object get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values.' adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py Author: Gael Varoquaux gael.varoquaux@normalesup.org License: BSD 3 clause predict(X) Predict targets from X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns target_values : array-like, shape = [n_samples] Predicted target values. score(X, y) Compute the prediction accuracy Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values (true class labels). Returns acc : float The prediction accuracy as a float between 0.0 and 1.0 (perfect score). set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py Author: Gael Varoquaux gael.varoquaux@normalesup.org License: BSD 3 clause","title":"Perceptron"},{"location":"api_modules/mlxtend.classifier/Perceptron/#perceptron","text":"Perceptron(eta=0.1, epochs=50, random_seed=None, print_progress=0) Perceptron classifier. Note that this implementation of the Perceptron expects binary class labels in {0, 1}. Parameters eta : float (default: 0.1) Learning rate (between 0.0 and 1.0) epochs : int (default: 50) Number of passes over the training dataset. Prior to each epoch, the dataset is shuffled to prevent cycles. random_seed : int Random state for initializing random weights and shuffling. print_progress : int (default: 0) Prints progress in fitting to stderr. 0: No output 1: Epochs elapsed and cost 2: 1 plus time elapsed 3: 2 plus estimated time until completion Attributes w_ : 2d-array, shape={n_features, 1} Model weights after fitting. b_ : 1d-array, shape={1,} Bias unit after fitting. cost_ : list Number of misclassifications in every epoch. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/classifier/Perceptron/","title":"Perceptron"},{"location":"api_modules/mlxtend.classifier/Perceptron/#methods","text":"fit(X, y, init_params=True) Learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. init_params : bool (default: True) Re-initializes model parameters prior to fitting. Set False to continue training with weights from a previous model fitting. Returns self : object get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values.' adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py","title":"Methods"},{"location":"api_modules/mlxtend.classifier/Perceptron/#author-gael-varoquaux-amp103amp97amp101amp108amp46amp118amp97amp114amp111amp113amp117amp97amp117amp120amp64amp110amp111amp114amp109amp97amp108amp101amp115amp117amp112amp46amp111amp114amp103","text":"","title":"Author: Gael Varoquaux gael.varoquaux@normalesup.org"},{"location":"api_modules/mlxtend.classifier/Perceptron/#license-bsd-3-clause","text":"predict(X) Predict targets from X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns target_values : array-like, shape = [n_samples] Predicted target values. score(X, y) Compute the prediction accuracy Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values (true class labels). Returns acc : float The prediction accuracy as a float between 0.0 and 1.0 (perfect score). set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py","title":"License: BSD 3 clause"},{"location":"api_modules/mlxtend.classifier/Perceptron/#author-gael-varoquaux-amp103amp97amp101amp108amp46amp118amp97amp114amp111amp113amp117amp97amp117amp120amp64amp110amp111amp114amp109amp97amp108amp101amp115amp117amp112amp46amp111amp114amp103_1","text":"","title":"Author: Gael Varoquaux gael.varoquaux@normalesup.org"},{"location":"api_modules/mlxtend.classifier/Perceptron/#license-bsd-3-clause_1","text":"","title":"License: BSD 3 clause"},{"location":"api_modules/mlxtend.classifier/SoftmaxRegression/","text":"SoftmaxRegression SoftmaxRegression(eta=0.01, epochs=50, l2=0.0, minibatches=1, n_classes=None, random_seed=None, print_progress=0) Softmax regression classifier. Parameters eta : float (default: 0.01) Learning rate (between 0.0 and 1.0) epochs : int (default: 50) Passes over the training dataset. Prior to each epoch, the dataset is shuffled if minibatches > 1 to prevent cycles in stochastic gradient descent. l2 : float Regularization parameter for L2 regularization. No regularization if l2=0.0. minibatches : int (default: 1) The number of minibatches for gradient-based optimization. If 1: Gradient Descent learning If len(y): Stochastic Gradient Descent (SGD) online learning If 1 < minibatches < len(y): SGD Minibatch learning n_classes : int (default: None) A positive integer to declare the number of class labels if not all class labels are present in a partial training set. Gets the number of class labels automatically if None. random_seed : int (default: None) Set random state for shuffling and initializing the weights. print_progress : int (default: 0) Prints progress in fitting to stderr. 0: No output 1: Epochs elapsed and cost 2: 1 plus time elapsed 3: 2 plus estimated time until completion Attributes w_ : 2d-array, shape={n_features, 1} Model weights after fitting. b_ : 1d-array, shape={1,} Bias unit after fitting. cost_ : list List of floats, the average cross_entropy for each epoch. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/classifier/SoftmaxRegression/ Methods fit(X, y, init_params=True) Learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. init_params : bool (default: True) Re-initializes model parameters prior to fitting. Set False to continue training with weights from a previous model fitting. Returns self : object get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values.' adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py Author: Gael Varoquaux gael.varoquaux@normalesup.org License: BSD 3 clause predict(X) Predict targets from X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns target_values : array-like, shape = [n_samples] Predicted target values. predict_proba(X) Predict class probabilities of X from the net input. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns Class probabilties : array-like, shape= [n_samples, n_classes] score(X, y) Compute the prediction accuracy Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values (true class labels). Returns acc : float The prediction accuracy as a float between 0.0 and 1.0 (perfect score). set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py Author: Gael Varoquaux gael.varoquaux@normalesup.org License: BSD 3 clause","title":"SoftmaxRegression"},{"location":"api_modules/mlxtend.classifier/SoftmaxRegression/#softmaxregression","text":"SoftmaxRegression(eta=0.01, epochs=50, l2=0.0, minibatches=1, n_classes=None, random_seed=None, print_progress=0) Softmax regression classifier. Parameters eta : float (default: 0.01) Learning rate (between 0.0 and 1.0) epochs : int (default: 50) Passes over the training dataset. Prior to each epoch, the dataset is shuffled if minibatches > 1 to prevent cycles in stochastic gradient descent. l2 : float Regularization parameter for L2 regularization. No regularization if l2=0.0. minibatches : int (default: 1) The number of minibatches for gradient-based optimization. If 1: Gradient Descent learning If len(y): Stochastic Gradient Descent (SGD) online learning If 1 < minibatches < len(y): SGD Minibatch learning n_classes : int (default: None) A positive integer to declare the number of class labels if not all class labels are present in a partial training set. Gets the number of class labels automatically if None. random_seed : int (default: None) Set random state for shuffling and initializing the weights. print_progress : int (default: 0) Prints progress in fitting to stderr. 0: No output 1: Epochs elapsed and cost 2: 1 plus time elapsed 3: 2 plus estimated time until completion Attributes w_ : 2d-array, shape={n_features, 1} Model weights after fitting. b_ : 1d-array, shape={1,} Bias unit after fitting. cost_ : list List of floats, the average cross_entropy for each epoch. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/classifier/SoftmaxRegression/","title":"SoftmaxRegression"},{"location":"api_modules/mlxtend.classifier/SoftmaxRegression/#methods","text":"fit(X, y, init_params=True) Learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. init_params : bool (default: True) Re-initializes model parameters prior to fitting. Set False to continue training with weights from a previous model fitting. Returns self : object get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values.' adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py","title":"Methods"},{"location":"api_modules/mlxtend.classifier/SoftmaxRegression/#author-gael-varoquaux-amp103amp97amp101amp108amp46amp118amp97amp114amp111amp113amp117amp97amp117amp120amp64amp110amp111amp114amp109amp97amp108amp101amp115amp117amp112amp46amp111amp114amp103","text":"","title":"Author: Gael Varoquaux gael.varoquaux@normalesup.org"},{"location":"api_modules/mlxtend.classifier/SoftmaxRegression/#license-bsd-3-clause","text":"predict(X) Predict targets from X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns target_values : array-like, shape = [n_samples] Predicted target values. predict_proba(X) Predict class probabilities of X from the net input. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns Class probabilties : array-like, shape= [n_samples, n_classes] score(X, y) Compute the prediction accuracy Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values (true class labels). Returns acc : float The prediction accuracy as a float between 0.0 and 1.0 (perfect score). set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py","title":"License: BSD 3 clause"},{"location":"api_modules/mlxtend.classifier/SoftmaxRegression/#author-gael-varoquaux-amp103amp97amp101amp108amp46amp118amp97amp114amp111amp113amp117amp97amp117amp120amp64amp110amp111amp114amp109amp97amp108amp101amp115amp117amp112amp46amp111amp114amp103_1","text":"","title":"Author: Gael Varoquaux gael.varoquaux@normalesup.org"},{"location":"api_modules/mlxtend.classifier/SoftmaxRegression/#license-bsd-3-clause_1","text":"","title":"License: BSD 3 clause"},{"location":"api_modules/mlxtend.classifier/StackingCVClassifier/","text":"StackingCVClassifier StackingCVClassifier(classifiers, meta_classifier, use_probas=False, cv=2, use_features_in_secondary=False, stratify=True, shuffle=True, verbose=0, store_train_meta_features=False, use_clones=True) A 'Stacking Cross-Validation' classifier for scikit-learn estimators. New in mlxtend v0.4.3 Notes The StackingCVClassifier uses scikit-learn's check_cv internally, which doesn't support a random seed. Thus NumPy's random seed need to be specified explicitely for deterministic behavior, for instance, by setting np.random.seed(RANDOM_SEED) prior to fitting the StackingCVClassifier Parameters classifiers : array-like, shape = [n_classifiers] A list of classifiers. Invoking the fit method on the StackingCVClassifer will fit clones of these original classifiers that will be stored in the class attribute self.clfs_ . meta_classifier : object The meta-classifier to be fitted on the ensemble of classifiers use_probas : bool (default: False) If True, trains meta-classifier based on predicted probabilities instead of class labels. cv : int, cross-validation generator or an iterable, optional (default: 2) Determines the cross-validation splitting strategy. Possible inputs for cv are: - None, to use the default 2-fold cross validation, - integer, to specify the number of folds in a (Stratified)KFold , - An object to be used as a cross-validation generator. - An iterable yielding train, test splits. For integer/None inputs, it will use either a KFold or StratifiedKFold cross validation depending the value of stratify argument. use_features_in_secondary : bool (default: False) If True, the meta-classifier will be trained both on the predictions of the original classifiers and the original dataset. If False, the meta-classifier will be trained only on the predictions of the original classifiers. stratify : bool (default: True) If True, and the cv argument is integer it will follow a stratified K-Fold cross validation technique. If the cv argument is a specific cross validation technique, this argument is omitted. shuffle : bool (default: True) If True, and the cv argument is integer, the training data will be shuffled at fitting stage prior to cross-validation. If the cv argument is a specific cross validation technique, this argument is omitted. verbose : int, optional (default=0) Controls the verbosity of the building process. - verbose=0 (default): Prints nothing - verbose=1 : Prints the number & name of the regressor being fitted and which fold is currently being used for fitting - verbose=2 : Prints info about the parameters of the regressor being fitted - verbose>2 : Changes verbose param of the underlying regressor to self.verbose - 2 store_train_meta_features : bool (default: False) If True, the meta-features computed from the training data used for fitting the meta-classifier stored in the self.train_meta_features_ array, which can be accessed after calling fit . use_clones : bool (default: True) Clones the classifiers for stacking classification if True (default) or else uses the original ones, which will be refitted on the dataset upon calling the fit method. Hence, if use_clones=True, the original input classifiers will remain unmodified upon using the StackingCVClassifier's fit method. Setting use_clones=False is recommended if you are working with estimators that are supporting the scikit-learn fit/predict API interface but are not compatible to scikit-learn's clone function. Attributes clfs_ : list, shape=[n_classifiers] Fitted classifiers (clones of the original classifiers) meta_clf_ : estimator Fitted meta-classifier (clone of the original meta-estimator) train_meta_features : numpy array, shape = [n_samples, n_classifiers] meta-features for training data, where n_samples is the number of samples in training data and n_classifiers is the number of classfiers. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/classifier/StackingCVClassifier/ Methods fit(X, y, groups=None, sample_weight=None) Fit ensemble classifers and the meta-classifier. Parameters X : numpy array, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : numpy array, shape = [n_samples] Target values. groups : numpy array/None, shape = [n_samples] The group that each sample belongs to. This is used by specific folding strategies such as GroupKFold() sample_weight : array-like, shape = [n_samples], optional Sample weights passed as sample_weights to each regressor in the regressors list as well as the meta_regressor. Raises error if some regressor does not support sample_weight in the fit() method. Returns self : object fit_transform(X, y=None, fit_params) Fit to data, then transform it. Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X. Parameters X : numpy array of shape [n_samples, n_features] Training set. y : numpy array of shape [n_samples] Target values. Returns X_new : numpy array of shape [n_samples, n_features_new] Transformed array. get_params(deep=True) Return estimator parameter names for GridSearch support. predict(X) Predict target values for X. Parameters X : numpy array, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns labels : array-like, shape = [n_samples] Predicted class labels. predict_meta_features(X) Get meta-features of test-data. Parameters X : numpy array, shape = [n_samples, n_features] Test vectors, where n_samples is the number of samples and n_features is the number of features. Returns meta-features : numpy array, shape = [n_samples, n_classifiers] Returns the meta-features for test data. predict_proba(X) Predict class probabilities for X. Parameters X : numpy array, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns proba : array-like, shape = [n_samples, n_classes] Probability for each class per sample. score(X, y, sample_weight=None) Returns the mean accuracy on the given test data and labels. In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted. Parameters X : array-like, shape = (n_samples, n_features) Test samples. y : array-like, shape = (n_samples) or (n_samples, n_outputs) True labels for X. sample_weight : array-like, shape = [n_samples], optional Sample weights. Returns score : float Mean accuracy of self.predict(X) wrt. y. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self","title":"StackingCVClassifier"},{"location":"api_modules/mlxtend.classifier/StackingCVClassifier/#stackingcvclassifier","text":"StackingCVClassifier(classifiers, meta_classifier, use_probas=False, cv=2, use_features_in_secondary=False, stratify=True, shuffle=True, verbose=0, store_train_meta_features=False, use_clones=True) A 'Stacking Cross-Validation' classifier for scikit-learn estimators. New in mlxtend v0.4.3 Notes The StackingCVClassifier uses scikit-learn's check_cv internally, which doesn't support a random seed. Thus NumPy's random seed need to be specified explicitely for deterministic behavior, for instance, by setting np.random.seed(RANDOM_SEED) prior to fitting the StackingCVClassifier Parameters classifiers : array-like, shape = [n_classifiers] A list of classifiers. Invoking the fit method on the StackingCVClassifer will fit clones of these original classifiers that will be stored in the class attribute self.clfs_ . meta_classifier : object The meta-classifier to be fitted on the ensemble of classifiers use_probas : bool (default: False) If True, trains meta-classifier based on predicted probabilities instead of class labels. cv : int, cross-validation generator or an iterable, optional (default: 2) Determines the cross-validation splitting strategy. Possible inputs for cv are: - None, to use the default 2-fold cross validation, - integer, to specify the number of folds in a (Stratified)KFold , - An object to be used as a cross-validation generator. - An iterable yielding train, test splits. For integer/None inputs, it will use either a KFold or StratifiedKFold cross validation depending the value of stratify argument. use_features_in_secondary : bool (default: False) If True, the meta-classifier will be trained both on the predictions of the original classifiers and the original dataset. If False, the meta-classifier will be trained only on the predictions of the original classifiers. stratify : bool (default: True) If True, and the cv argument is integer it will follow a stratified K-Fold cross validation technique. If the cv argument is a specific cross validation technique, this argument is omitted. shuffle : bool (default: True) If True, and the cv argument is integer, the training data will be shuffled at fitting stage prior to cross-validation. If the cv argument is a specific cross validation technique, this argument is omitted. verbose : int, optional (default=0) Controls the verbosity of the building process. - verbose=0 (default): Prints nothing - verbose=1 : Prints the number & name of the regressor being fitted and which fold is currently being used for fitting - verbose=2 : Prints info about the parameters of the regressor being fitted - verbose>2 : Changes verbose param of the underlying regressor to self.verbose - 2 store_train_meta_features : bool (default: False) If True, the meta-features computed from the training data used for fitting the meta-classifier stored in the self.train_meta_features_ array, which can be accessed after calling fit . use_clones : bool (default: True) Clones the classifiers for stacking classification if True (default) or else uses the original ones, which will be refitted on the dataset upon calling the fit method. Hence, if use_clones=True, the original input classifiers will remain unmodified upon using the StackingCVClassifier's fit method. Setting use_clones=False is recommended if you are working with estimators that are supporting the scikit-learn fit/predict API interface but are not compatible to scikit-learn's clone function. Attributes clfs_ : list, shape=[n_classifiers] Fitted classifiers (clones of the original classifiers) meta_clf_ : estimator Fitted meta-classifier (clone of the original meta-estimator) train_meta_features : numpy array, shape = [n_samples, n_classifiers] meta-features for training data, where n_samples is the number of samples in training data and n_classifiers is the number of classfiers. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/classifier/StackingCVClassifier/","title":"StackingCVClassifier"},{"location":"api_modules/mlxtend.classifier/StackingCVClassifier/#methods","text":"fit(X, y, groups=None, sample_weight=None) Fit ensemble classifers and the meta-classifier. Parameters X : numpy array, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : numpy array, shape = [n_samples] Target values. groups : numpy array/None, shape = [n_samples] The group that each sample belongs to. This is used by specific folding strategies such as GroupKFold() sample_weight : array-like, shape = [n_samples], optional Sample weights passed as sample_weights to each regressor in the regressors list as well as the meta_regressor. Raises error if some regressor does not support sample_weight in the fit() method. Returns self : object fit_transform(X, y=None, fit_params) Fit to data, then transform it. Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X. Parameters X : numpy array of shape [n_samples, n_features] Training set. y : numpy array of shape [n_samples] Target values. Returns X_new : numpy array of shape [n_samples, n_features_new] Transformed array. get_params(deep=True) Return estimator parameter names for GridSearch support. predict(X) Predict target values for X. Parameters X : numpy array, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns labels : array-like, shape = [n_samples] Predicted class labels. predict_meta_features(X) Get meta-features of test-data. Parameters X : numpy array, shape = [n_samples, n_features] Test vectors, where n_samples is the number of samples and n_features is the number of features. Returns meta-features : numpy array, shape = [n_samples, n_classifiers] Returns the meta-features for test data. predict_proba(X) Predict class probabilities for X. Parameters X : numpy array, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns proba : array-like, shape = [n_samples, n_classes] Probability for each class per sample. score(X, y, sample_weight=None) Returns the mean accuracy on the given test data and labels. In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted. Parameters X : array-like, shape = (n_samples, n_features) Test samples. y : array-like, shape = (n_samples) or (n_samples, n_outputs) True labels for X. sample_weight : array-like, shape = [n_samples], optional Sample weights. Returns score : float Mean accuracy of self.predict(X) wrt. y. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self","title":"Methods"},{"location":"api_modules/mlxtend.classifier/StackingClassifier/","text":"StackingClassifier StackingClassifier(classifiers, meta_classifier, use_probas=False, average_probas=False, verbose=0, use_features_in_secondary=False, store_train_meta_features=False, use_clones=True) A Stacking classifier for scikit-learn estimators for classification. Parameters classifiers : array-like, shape = [n_classifiers] A list of classifiers. Invoking the fit method on the StackingClassifer will fit clones of these original classifiers that will be stored in the class attribute self.clfs_ . meta_classifier : object The meta-classifier to be fitted on the ensemble of classifiers use_probas : bool (default: False) If True, trains meta-classifier based on predicted probabilities instead of class labels. average_probas : bool (default: False) Averages the probabilities as meta features if True. verbose : int, optional (default=0) Controls the verbosity of the building process. - verbose=0 (default): Prints nothing - verbose=1 : Prints the number & name of the regressor being fitted - verbose=2 : Prints info about the parameters of the regressor being fitted - verbose>2 : Changes verbose param of the underlying regressor to self.verbose - 2 use_features_in_secondary : bool (default: False) If True, the meta-classifier will be trained both on the predictions of the original classifiers and the original dataset. If False, the meta-classifier will be trained only on the predictions of the original classifiers. store_train_meta_features : bool (default: False) If True, the meta-features computed from the training data used for fitting the meta-classifier stored in the self.train_meta_features_ array, which can be accessed after calling fit . use_clones : bool (default: True) Clones the classifiers for stacking classification if True (default) or else uses the original ones, which will be refitted on the dataset upon calling the fit method. Hence, if use_clones=True, the original input classifiers will remain unmodified upon using the StackingClassifier's fit method. Setting use_clones=False is recommended if you are working with estimators that are supporting the scikit-learn fit/predict API interface but are not compatible to scikit-learn's clone function. Attributes clfs_ : list, shape=[n_classifiers] Fitted classifiers (clones of the original classifiers) meta_clf_ : estimator Fitted meta-classifier (clone of the original meta-estimator) train_meta_features : numpy array, shape = [n_samples, n_classifiers] meta-features for training data, where n_samples is the number of samples in training data and n_classifiers is the number of classfiers. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/classifier/StackingClassifier/ Methods fit(X, y, sample_weight=None) Fit ensemble classifers and the meta-classifier. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] or [n_samples, n_outputs] Target values. sample_weight : array-like, shape = [n_samples], optional Sample weights passed as sample_weights to each regressor in the regressors list as well as the meta_regressor. Raises error if some regressor does not support sample_weight in the fit() method. Returns self : object fit_transform(X, y=None, fit_params) Fit to data, then transform it. Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X. Parameters X : numpy array of shape [n_samples, n_features] Training set. y : numpy array of shape [n_samples] Target values. Returns X_new : numpy array of shape [n_samples, n_features_new] Transformed array. get_params(deep=True) Return estimator parameter names for GridSearch support. predict(X) Predict target values for X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns labels : array-like, shape = [n_samples] or [n_samples, n_outputs] Predicted class labels. predict_meta_features(X) Get meta-features of test-data. Parameters X : numpy array, shape = [n_samples, n_features] Test vectors, where n_samples is the number of samples and n_features is the number of features. Returns meta-features : numpy array, shape = [n_samples, n_classifiers] Returns the meta-features for test data. predict_proba(X) Predict class probabilities for X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns proba : array-like, shape = [n_samples, n_classes] or a list of n_outputs of such arrays if n_outputs > 1. Probability for each class per sample. score(X, y, sample_weight=None) Returns the mean accuracy on the given test data and labels. In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted. Parameters X : array-like, shape = (n_samples, n_features) Test samples. y : array-like, shape = (n_samples) or (n_samples, n_outputs) True labels for X. sample_weight : array-like, shape = [n_samples], optional Sample weights. Returns score : float Mean accuracy of self.predict(X) wrt. y. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self","title":"StackingClassifier"},{"location":"api_modules/mlxtend.classifier/StackingClassifier/#stackingclassifier","text":"StackingClassifier(classifiers, meta_classifier, use_probas=False, average_probas=False, verbose=0, use_features_in_secondary=False, store_train_meta_features=False, use_clones=True) A Stacking classifier for scikit-learn estimators for classification. Parameters classifiers : array-like, shape = [n_classifiers] A list of classifiers. Invoking the fit method on the StackingClassifer will fit clones of these original classifiers that will be stored in the class attribute self.clfs_ . meta_classifier : object The meta-classifier to be fitted on the ensemble of classifiers use_probas : bool (default: False) If True, trains meta-classifier based on predicted probabilities instead of class labels. average_probas : bool (default: False) Averages the probabilities as meta features if True. verbose : int, optional (default=0) Controls the verbosity of the building process. - verbose=0 (default): Prints nothing - verbose=1 : Prints the number & name of the regressor being fitted - verbose=2 : Prints info about the parameters of the regressor being fitted - verbose>2 : Changes verbose param of the underlying regressor to self.verbose - 2 use_features_in_secondary : bool (default: False) If True, the meta-classifier will be trained both on the predictions of the original classifiers and the original dataset. If False, the meta-classifier will be trained only on the predictions of the original classifiers. store_train_meta_features : bool (default: False) If True, the meta-features computed from the training data used for fitting the meta-classifier stored in the self.train_meta_features_ array, which can be accessed after calling fit . use_clones : bool (default: True) Clones the classifiers for stacking classification if True (default) or else uses the original ones, which will be refitted on the dataset upon calling the fit method. Hence, if use_clones=True, the original input classifiers will remain unmodified upon using the StackingClassifier's fit method. Setting use_clones=False is recommended if you are working with estimators that are supporting the scikit-learn fit/predict API interface but are not compatible to scikit-learn's clone function. Attributes clfs_ : list, shape=[n_classifiers] Fitted classifiers (clones of the original classifiers) meta_clf_ : estimator Fitted meta-classifier (clone of the original meta-estimator) train_meta_features : numpy array, shape = [n_samples, n_classifiers] meta-features for training data, where n_samples is the number of samples in training data and n_classifiers is the number of classfiers. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/classifier/StackingClassifier/","title":"StackingClassifier"},{"location":"api_modules/mlxtend.classifier/StackingClassifier/#methods","text":"fit(X, y, sample_weight=None) Fit ensemble classifers and the meta-classifier. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] or [n_samples, n_outputs] Target values. sample_weight : array-like, shape = [n_samples], optional Sample weights passed as sample_weights to each regressor in the regressors list as well as the meta_regressor. Raises error if some regressor does not support sample_weight in the fit() method. Returns self : object fit_transform(X, y=None, fit_params) Fit to data, then transform it. Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X. Parameters X : numpy array of shape [n_samples, n_features] Training set. y : numpy array of shape [n_samples] Target values. Returns X_new : numpy array of shape [n_samples, n_features_new] Transformed array. get_params(deep=True) Return estimator parameter names for GridSearch support. predict(X) Predict target values for X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns labels : array-like, shape = [n_samples] or [n_samples, n_outputs] Predicted class labels. predict_meta_features(X) Get meta-features of test-data. Parameters X : numpy array, shape = [n_samples, n_features] Test vectors, where n_samples is the number of samples and n_features is the number of features. Returns meta-features : numpy array, shape = [n_samples, n_classifiers] Returns the meta-features for test data. predict_proba(X) Predict class probabilities for X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns proba : array-like, shape = [n_samples, n_classes] or a list of n_outputs of such arrays if n_outputs > 1. Probability for each class per sample. score(X, y, sample_weight=None) Returns the mean accuracy on the given test data and labels. In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted. Parameters X : array-like, shape = (n_samples, n_features) Test samples. y : array-like, shape = (n_samples) or (n_samples, n_outputs) True labels for X. sample_weight : array-like, shape = [n_samples], optional Sample weights. Returns score : float Mean accuracy of self.predict(X) wrt. y. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self","title":"Methods"},{"location":"api_modules/mlxtend.cluster/Kmeans/","text":"Kmeans Kmeans(k, max_iter=10, convergence_tolerance=1e-05, random_seed=None, print_progress=0) K-means clustering class. Added in 0.4.1dev Parameters k : int Number of clusters max_iter : int (default: 10) Number of iterations during cluster assignment. Cluster re-assignment stops automatically when the algorithm converged. convergence_tolerance : float (default: 1e-05) Compares current centroids with centroids of the previous iteration using the given tolerance (a small positive float)to determine if the algorithm converged early. random_seed : int (default: None) Set random state for the initial centroid assignment. print_progress : int (default: 0) Prints progress in fitting to stderr. 0: No output 1: Iterations elapsed 2: 1 plus time elapsed 3: 2 plus estimated time until completion Attributes centroids_ : 2d-array, shape={k, n_features} Feature values of the k cluster centroids. custers_ : dictionary The cluster assignments stored as a Python dictionary; the dictionary keys denote the cluster indeces and the items are Python lists of the sample indices that were assigned to each cluster. iterations_ : int Number of iterations until convergence. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/classifier/Kmeans/ Methods fit(X, init_params=True) Learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. init_params : bool (default: True) Re-initializes model parameters prior to fitting. Set False to continue training with weights from a previous model fitting. Returns self : object get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values.' adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py Author: Gael Varoquaux gael.varoquaux@normalesup.org License: BSD 3 clause predict(X) Predict targets from X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns target_values : array-like, shape = [n_samples] Predicted target values. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py Author: Gael Varoquaux gael.varoquaux@normalesup.org License: BSD 3 clause","title":"Kmeans"},{"location":"api_modules/mlxtend.cluster/Kmeans/#kmeans","text":"Kmeans(k, max_iter=10, convergence_tolerance=1e-05, random_seed=None, print_progress=0) K-means clustering class. Added in 0.4.1dev Parameters k : int Number of clusters max_iter : int (default: 10) Number of iterations during cluster assignment. Cluster re-assignment stops automatically when the algorithm converged. convergence_tolerance : float (default: 1e-05) Compares current centroids with centroids of the previous iteration using the given tolerance (a small positive float)to determine if the algorithm converged early. random_seed : int (default: None) Set random state for the initial centroid assignment. print_progress : int (default: 0) Prints progress in fitting to stderr. 0: No output 1: Iterations elapsed 2: 1 plus time elapsed 3: 2 plus estimated time until completion Attributes centroids_ : 2d-array, shape={k, n_features} Feature values of the k cluster centroids. custers_ : dictionary The cluster assignments stored as a Python dictionary; the dictionary keys denote the cluster indeces and the items are Python lists of the sample indices that were assigned to each cluster. iterations_ : int Number of iterations until convergence. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/classifier/Kmeans/","title":"Kmeans"},{"location":"api_modules/mlxtend.cluster/Kmeans/#methods","text":"fit(X, init_params=True) Learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. init_params : bool (default: True) Re-initializes model parameters prior to fitting. Set False to continue training with weights from a previous model fitting. Returns self : object get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values.' adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py","title":"Methods"},{"location":"api_modules/mlxtend.cluster/Kmeans/#author-gael-varoquaux-amp103amp97amp101amp108amp46amp118amp97amp114amp111amp113amp117amp97amp117amp120amp64amp110amp111amp114amp109amp97amp108amp101amp115amp117amp112amp46amp111amp114amp103","text":"","title":"Author: Gael Varoquaux gael.varoquaux@normalesup.org"},{"location":"api_modules/mlxtend.cluster/Kmeans/#license-bsd-3-clause","text":"predict(X) Predict targets from X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns target_values : array-like, shape = [n_samples] Predicted target values. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py","title":"License: BSD 3 clause"},{"location":"api_modules/mlxtend.cluster/Kmeans/#author-gael-varoquaux-amp103amp97amp101amp108amp46amp118amp97amp114amp111amp113amp117amp97amp117amp120amp64amp110amp111amp114amp109amp97amp108amp101amp115amp117amp112amp46amp111amp114amp103_1","text":"","title":"Author: Gael Varoquaux gael.varoquaux@normalesup.org"},{"location":"api_modules/mlxtend.cluster/Kmeans/#license-bsd-3-clause_1","text":"","title":"License: BSD 3 clause"},{"location":"api_modules/mlxtend.data/autompg_data/","text":"autompg_data autompg_data() Auto MPG dataset. Source : https://archive.ics.uci.edu/ml/datasets/Auto+MPG Number of samples : 392 Continuous target variable : mpg Dataset Attributes: 1) cylinders: multi-valued discrete 2) displacement: continuous 3) horsepower: continuous 4) weight: continuous 5) acceleration: continuous 6) model year: multi-valued discrete 7) origin: multi-valued discrete 8) car name: string (unique for each instance) Returns X, y : [n_samples, n_features], [n_targets] X is the feature matrix with 392 auto samples as rows and 8 feature columns (6 rows with NaNs removed). y is a 1-dimensional array of the target MPG values. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/data/autompg_data/","title":"Autompg data"},{"location":"api_modules/mlxtend.data/autompg_data/#autompg_data","text":"autompg_data() Auto MPG dataset. Source : https://archive.ics.uci.edu/ml/datasets/Auto+MPG Number of samples : 392 Continuous target variable : mpg Dataset Attributes: 1) cylinders: multi-valued discrete 2) displacement: continuous 3) horsepower: continuous 4) weight: continuous 5) acceleration: continuous 6) model year: multi-valued discrete 7) origin: multi-valued discrete 8) car name: string (unique for each instance) Returns X, y : [n_samples, n_features], [n_targets] X is the feature matrix with 392 auto samples as rows and 8 feature columns (6 rows with NaNs removed). y is a 1-dimensional array of the target MPG values. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/data/autompg_data/","title":"autompg_data"},{"location":"api_modules/mlxtend.data/boston_housing_data/","text":"boston_housing_data boston_housing_data() Boston Housing dataset. Source : https://archive.ics.uci.edu/ml/datasets/Housing Number of samples : 506 Continuous target variable : MEDV MEDV = Median value of owner-occupied homes in $1000's Dataset Attributes: 1) CRIM per capita crime rate by town 2) ZN proportion of residential land zoned for lots over 25,000 sq.ft. 3) INDUS proportion of non-retail business acres per town 4) CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise) 5) NOX nitric oxides concentration (parts per 10 million) 6) RM average number of rooms per dwelling 7) AGE proportion of owner-occupied units built prior to 1940 8) DIS weighted distances to five Boston employment centres 9) RAD index of accessibility to radial highways 10) TAX full-value property-tax rate per $10,000 11) PTRATIO pupil-teacher ratio by town 12) B 1000(Bk - 0.63)^2 where Bk is the prop. of b. by town 13) LSTAT % lower status of the population Returns X, y : [n_samples, n_features], [n_class_labels] X is the feature matrix with 506 housing samples as rows and 13 feature columns. y is a 1-dimensional array of the continuous target variable MEDV Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/data/boston_housing_data/","title":"Boston housing data"},{"location":"api_modules/mlxtend.data/boston_housing_data/#boston_housing_data","text":"boston_housing_data() Boston Housing dataset. Source : https://archive.ics.uci.edu/ml/datasets/Housing Number of samples : 506 Continuous target variable : MEDV MEDV = Median value of owner-occupied homes in $1000's Dataset Attributes: 1) CRIM per capita crime rate by town 2) ZN proportion of residential land zoned for lots over 25,000 sq.ft. 3) INDUS proportion of non-retail business acres per town 4) CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise) 5) NOX nitric oxides concentration (parts per 10 million) 6) RM average number of rooms per dwelling 7) AGE proportion of owner-occupied units built prior to 1940 8) DIS weighted distances to five Boston employment centres 9) RAD index of accessibility to radial highways 10) TAX full-value property-tax rate per $10,000 11) PTRATIO pupil-teacher ratio by town 12) B 1000(Bk - 0.63)^2 where Bk is the prop. of b. by town 13) LSTAT % lower status of the population Returns X, y : [n_samples, n_features], [n_class_labels] X is the feature matrix with 506 housing samples as rows and 13 feature columns. y is a 1-dimensional array of the continuous target variable MEDV Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/data/boston_housing_data/","title":"boston_housing_data"},{"location":"api_modules/mlxtend.data/iris_data/","text":"iris_data iris_data() Iris flower dataset. Source : https://archive.ics.uci.edu/ml/datasets/Iris Number of samples : 150 Class labels : {0, 1, 2}, distribution: [50, 50, 50] 0 = setosa, 1 = versicolor, 2 = virginica. Dataset Attributes: 1) sepal length [cm] 2) sepal width [cm] 3) petal length [cm] 4) petal width [cm] Returns X, y : [n_samples, n_features], [n_class_labels] X is the feature matrix with 150 flower samples as rows, and 4 feature columns sepal length, sepal width, petal length, and petal width. y is a 1-dimensional array of the class labels {0, 1, 2} Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/data/iris_data/","title":"Iris data"},{"location":"api_modules/mlxtend.data/iris_data/#iris_data","text":"iris_data() Iris flower dataset. Source : https://archive.ics.uci.edu/ml/datasets/Iris Number of samples : 150 Class labels : {0, 1, 2}, distribution: [50, 50, 50] 0 = setosa, 1 = versicolor, 2 = virginica. Dataset Attributes: 1) sepal length [cm] 2) sepal width [cm] 3) petal length [cm] 4) petal width [cm] Returns X, y : [n_samples, n_features], [n_class_labels] X is the feature matrix with 150 flower samples as rows, and 4 feature columns sepal length, sepal width, petal length, and petal width. y is a 1-dimensional array of the class labels {0, 1, 2} Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/data/iris_data/","title":"iris_data"},{"location":"api_modules/mlxtend.data/loadlocal_mnist/","text":"loadlocal_mnist loadlocal_mnist(images_path, labels_path) Read MNIST from ubyte files. Parameters images_path : str path to the test or train MNIST ubyte file labels_path : str path to the test or train MNIST class labels file Returns images : [n_samples, n_pixels] numpy.array Pixel values of the images. labels : [n_samples] numpy array Target class labels Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/data/loadlocal_mnist/","title":"Loadlocal mnist"},{"location":"api_modules/mlxtend.data/loadlocal_mnist/#loadlocal_mnist","text":"loadlocal_mnist(images_path, labels_path) Read MNIST from ubyte files. Parameters images_path : str path to the test or train MNIST ubyte file labels_path : str path to the test or train MNIST class labels file Returns images : [n_samples, n_pixels] numpy.array Pixel values of the images. labels : [n_samples] numpy array Target class labels Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/data/loadlocal_mnist/","title":"loadlocal_mnist"},{"location":"api_modules/mlxtend.data/make_multiplexer_dataset/","text":"make_multiplexer_dataset make_multiplexer_dataset(address_bits=2, sample_size=100, positive_class_ratio=0.5, shuffle=False, random_seed=None) Function to create a binary n-bit multiplexer dataset. New in mlxtend v0.9 Parameters address_bits : int (default: 2) A positive integer that determines the number of address bits in the multiplexer, which in turn determine the n-bit capacity of the multiplexer and therefore the number of features. The number of features is determined by the number of address bits. For example, 2 address bits will result in a 6 bit multiplexer and consequently 6 features (2 + 2^2 = 6). If address_bits=3 , then this results in an 11-bit multiplexer as (2 + 2^3 = 11) with 11 features. sample_size : int (default: 100) The total number of samples generated. positive_class_ratio : float (default: 0.5) The fraction (a float between 0 and 1) of samples in the sample_size d dataset that have class label 1. If positive_class_ratio=0.5 (default), then the ratio of class 0 and class 1 samples is perfectly balanced. shuffle : Bool (default: False) Whether or not to shuffle the features and labels. If False (default), the samples are returned in sorted order starting with sample_size /2 samples with class label 0 and followed by sample_size /2 samples with class label 1. random_seed : int (default: None) Random seed used for generating the multiplexer samples and shuffling. Returns X, y : [n_samples, n_features], [n_class_labels] X is the feature matrix with the number of samples equal to sample_size . The number of features is determined by the number of address bits. For instance, 2 address bits will result in a 6 bit multiplexer and consequently 6 features (2 + 2^2 = 6). All features are binary (values in {0, 1}). y is a 1-dimensional array of class labels in {0, 1}. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/data/make_multiplexer_dataset","title":"Make multiplexer dataset"},{"location":"api_modules/mlxtend.data/make_multiplexer_dataset/#make_multiplexer_dataset","text":"make_multiplexer_dataset(address_bits=2, sample_size=100, positive_class_ratio=0.5, shuffle=False, random_seed=None) Function to create a binary n-bit multiplexer dataset. New in mlxtend v0.9 Parameters address_bits : int (default: 2) A positive integer that determines the number of address bits in the multiplexer, which in turn determine the n-bit capacity of the multiplexer and therefore the number of features. The number of features is determined by the number of address bits. For example, 2 address bits will result in a 6 bit multiplexer and consequently 6 features (2 + 2^2 = 6). If address_bits=3 , then this results in an 11-bit multiplexer as (2 + 2^3 = 11) with 11 features. sample_size : int (default: 100) The total number of samples generated. positive_class_ratio : float (default: 0.5) The fraction (a float between 0 and 1) of samples in the sample_size d dataset that have class label 1. If positive_class_ratio=0.5 (default), then the ratio of class 0 and class 1 samples is perfectly balanced. shuffle : Bool (default: False) Whether or not to shuffle the features and labels. If False (default), the samples are returned in sorted order starting with sample_size /2 samples with class label 0 and followed by sample_size /2 samples with class label 1. random_seed : int (default: None) Random seed used for generating the multiplexer samples and shuffling. Returns X, y : [n_samples, n_features], [n_class_labels] X is the feature matrix with the number of samples equal to sample_size . The number of features is determined by the number of address bits. For instance, 2 address bits will result in a 6 bit multiplexer and consequently 6 features (2 + 2^2 = 6). All features are binary (values in {0, 1}). y is a 1-dimensional array of class labels in {0, 1}. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/data/make_multiplexer_dataset","title":"make_multiplexer_dataset"},{"location":"api_modules/mlxtend.data/mnist_data/","text":"mnist_data mnist_data() 5000 samples from the MNIST handwritten digits dataset. Data Source : http://yann.lecun.com/exdb/mnist/ Returns X, y : [n_samples, n_features], [n_class_labels] X is the feature matrix with 5000 image samples as rows, each row consists of 28x28 pixels that were unrolled into 784 pixel feature vectors. y contains the 10 unique class labels 0-9. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/data/mnist_data/","title":"Mnist data"},{"location":"api_modules/mlxtend.data/mnist_data/#mnist_data","text":"mnist_data() 5000 samples from the MNIST handwritten digits dataset. Data Source : http://yann.lecun.com/exdb/mnist/ Returns X, y : [n_samples, n_features], [n_class_labels] X is the feature matrix with 5000 image samples as rows, each row consists of 28x28 pixels that were unrolled into 784 pixel feature vectors. y contains the 10 unique class labels 0-9. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/data/mnist_data/","title":"mnist_data"},{"location":"api_modules/mlxtend.data/three_blobs_data/","text":"three_blobs_data three_blobs_data() A random dataset of 3 2D blobs for clustering. Number of samples : 150 Suggested labels : {0, 1, 2}, distribution: [50, 50, 50] Returns X, y : [n_samples, n_features], [n_cluster_labels] X is the feature matrix with 159 samples as rows and 2 feature columns. y is a 1-dimensional array of the 3 suggested cluster labels 0, 1, 2 Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/data/three_blobs_data","title":"Three blobs data"},{"location":"api_modules/mlxtend.data/three_blobs_data/#three_blobs_data","text":"three_blobs_data() A random dataset of 3 2D blobs for clustering. Number of samples : 150 Suggested labels : {0, 1, 2}, distribution: [50, 50, 50] Returns X, y : [n_samples, n_features], [n_cluster_labels] X is the feature matrix with 159 samples as rows and 2 feature columns. y is a 1-dimensional array of the 3 suggested cluster labels 0, 1, 2 Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/data/three_blobs_data","title":"three_blobs_data"},{"location":"api_modules/mlxtend.data/wine_data/","text":"wine_data wine_data() Wine dataset. Source : https://archive.ics.uci.edu/ml/datasets/Wine Number of samples : 178 Class labels : {0, 1, 2}, distribution: [59, 71, 48] Dataset Attributes: 1) Alcohol 2) Malic acid 3) Ash 4) Alcalinity of ash 5) Magnesium 6) Total phenols 7) Flavanoids 8) Nonflavanoid phenols 9) Proanthocyanins 10) Color intensity 11) Hue 12) OD280/OD315 of diluted wines 13) Proline Returns X, y : [n_samples, n_features], [n_class_labels] X is the feature matrix with 178 wine samples as rows and 13 feature columns. y is a 1-dimensional array of the 3 class labels 0, 1, 2 Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/data/wine_data","title":"Wine data"},{"location":"api_modules/mlxtend.data/wine_data/#wine_data","text":"wine_data() Wine dataset. Source : https://archive.ics.uci.edu/ml/datasets/Wine Number of samples : 178 Class labels : {0, 1, 2}, distribution: [59, 71, 48] Dataset Attributes: 1) Alcohol 2) Malic acid 3) Ash 4) Alcalinity of ash 5) Magnesium 6) Total phenols 7) Flavanoids 8) Nonflavanoid phenols 9) Proanthocyanins 10) Color intensity 11) Hue 12) OD280/OD315 of diluted wines 13) Proline Returns X, y : [n_samples, n_features], [n_class_labels] X is the feature matrix with 178 wine samples as rows and 13 feature columns. y is a 1-dimensional array of the 3 class labels 0, 1, 2 Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/data/wine_data","title":"wine_data"},{"location":"api_modules/mlxtend.evaluate/BootstrapOutOfBag/","text":"BootstrapOutOfBag BootstrapOutOfBag(n_splits=200, random_seed=None) Parameters n_splits : int (default=200) Number of bootstrap iterations. Must be larger than 1. random_seed : int (default=None) If int, random_seed is the seed used by the random number generator. Returns train_idx : ndarray The training set indices for that split. test_idx : ndarray The testing set indices for that split. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/BootstrapOutOfBag/ Methods get_n_splits(X=None, y=None, groups=None) Returns the number of splitting iterations in the cross-validator Parameters X : object Always ignored, exists for compatibility with scikit-learn. y : object Always ignored, exists for compatibility with scikit-learn. groups : object Always ignored, exists for compatibility with scikit-learn. Returns n_splits : int Returns the number of splitting iterations in the cross-validator. split(X, y=None, groups=None) y : array-like or None (default: None) Argument is not used and only included as parameter for compatibility, similar to KFold in scikit-learn. groups : array-like or None (default: None) Argument is not used and only included as parameter for compatibility, similar to KFold in scikit-learn.","title":"BootstrapOutOfBag"},{"location":"api_modules/mlxtend.evaluate/BootstrapOutOfBag/#bootstrapoutofbag","text":"BootstrapOutOfBag(n_splits=200, random_seed=None) Parameters n_splits : int (default=200) Number of bootstrap iterations. Must be larger than 1. random_seed : int (default=None) If int, random_seed is the seed used by the random number generator. Returns train_idx : ndarray The training set indices for that split. test_idx : ndarray The testing set indices for that split. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/BootstrapOutOfBag/","title":"BootstrapOutOfBag"},{"location":"api_modules/mlxtend.evaluate/BootstrapOutOfBag/#methods","text":"get_n_splits(X=None, y=None, groups=None) Returns the number of splitting iterations in the cross-validator Parameters X : object Always ignored, exists for compatibility with scikit-learn. y : object Always ignored, exists for compatibility with scikit-learn. groups : object Always ignored, exists for compatibility with scikit-learn. Returns n_splits : int Returns the number of splitting iterations in the cross-validator. split(X, y=None, groups=None) y : array-like or None (default: None) Argument is not used and only included as parameter for compatibility, similar to KFold in scikit-learn. groups : array-like or None (default: None) Argument is not used and only included as parameter for compatibility, similar to KFold in scikit-learn.","title":"Methods"},{"location":"api_modules/mlxtend.evaluate/PredefinedHoldoutSplit/","text":"PredefinedHoldoutSplit PredefinedHoldoutSplit(valid_indices) Train/Validation set splitter for sklearn's GridSearchCV etc. Uses user-specified train/validation set indices to split a dataset into train/validation sets using user-defined or random indices. Parameters valid_indices : array-like, shape (num_examples,) Indices of the training examples in the training set to be used for validation. All other indices in the training set are used to for a training subset for model fitting. Methods get_n_splits(X=None, y=None, groups=None) Returns the number of splitting iterations in the cross-validator Parameters X : object Always ignored, exists for compatibility. y : object Always ignored, exists for compatibility. groups : object Always ignored, exists for compatibility. Returns n_splits : 1 Returns the number of splitting iterations in the cross-validator. Always returns 1. split(X, y, groups=None) Generate indices to split data into training and test set. Parameters X : array-like, shape (num_examples, num_features) Training data, where num_examples is the number of examples and num_features is the number of features. y : array-like, shape (num_examples,) The target variable for supervised learning problems. Stratification is done based on the y labels. groups : object Always ignored, exists for compatibility. Yields train_index : ndarray The training set indices for that split. valid_index : ndarray The validation set indices for that split.","title":"PredefinedHoldoutSplit"},{"location":"api_modules/mlxtend.evaluate/PredefinedHoldoutSplit/#predefinedholdoutsplit","text":"PredefinedHoldoutSplit(valid_indices) Train/Validation set splitter for sklearn's GridSearchCV etc. Uses user-specified train/validation set indices to split a dataset into train/validation sets using user-defined or random indices. Parameters valid_indices : array-like, shape (num_examples,) Indices of the training examples in the training set to be used for validation. All other indices in the training set are used to for a training subset for model fitting.","title":"PredefinedHoldoutSplit"},{"location":"api_modules/mlxtend.evaluate/PredefinedHoldoutSplit/#methods","text":"get_n_splits(X=None, y=None, groups=None) Returns the number of splitting iterations in the cross-validator Parameters X : object Always ignored, exists for compatibility. y : object Always ignored, exists for compatibility. groups : object Always ignored, exists for compatibility. Returns n_splits : 1 Returns the number of splitting iterations in the cross-validator. Always returns 1. split(X, y, groups=None) Generate indices to split data into training and test set. Parameters X : array-like, shape (num_examples, num_features) Training data, where num_examples is the number of examples and num_features is the number of features. y : array-like, shape (num_examples,) The target variable for supervised learning problems. Stratification is done based on the y labels. groups : object Always ignored, exists for compatibility. Yields train_index : ndarray The training set indices for that split. valid_index : ndarray The validation set indices for that split.","title":"Methods"},{"location":"api_modules/mlxtend.evaluate/RandomHoldoutSplit/","text":"RandomHoldoutSplit RandomHoldoutSplit(valid_size=0.5, random_seed=None, stratify=False) Train/Validation set splitter for sklearn's GridSearchCV etc. Provides train/validation set indices to split a dataset into train/validation sets using random indices. Parameters valid_size : float (default: 0.5) Proportion of examples that being assigned as validation examples. 1- valid_size will then automatically be assigned as training set examples. random_seed : int (default: None) The random seed for splitting the data into training and validation set partitions. stratify : bool (default: False) True or False, whether to perform a stratified split or not Methods get_n_splits(X=None, y=None, groups=None) Returns the number of splitting iterations in the cross-validator Parameters X : object Always ignored, exists for compatibility. y : object Always ignored, exists for compatibility. groups : object Always ignored, exists for compatibility. Returns n_splits : 1 Returns the number of splitting iterations in the cross-validator. Always returns 1. split(X, y, groups=None) Generate indices to split data into training and test set. Parameters X : array-like, shape (num_examples, num_features) Training data, where num_examples is the number of training examples and num_features is the number of features. y : array-like, shape (num_examples,) The target variable for supervised learning problems. Stratification is done based on the y labels. groups : object Always ignored, exists for compatibility. Yields train_index : ndarray The training set indices for that split. valid_index : ndarray The validation set indices for that split.","title":"RandomHoldoutSplit"},{"location":"api_modules/mlxtend.evaluate/RandomHoldoutSplit/#randomholdoutsplit","text":"RandomHoldoutSplit(valid_size=0.5, random_seed=None, stratify=False) Train/Validation set splitter for sklearn's GridSearchCV etc. Provides train/validation set indices to split a dataset into train/validation sets using random indices. Parameters valid_size : float (default: 0.5) Proportion of examples that being assigned as validation examples. 1- valid_size will then automatically be assigned as training set examples. random_seed : int (default: None) The random seed for splitting the data into training and validation set partitions. stratify : bool (default: False) True or False, whether to perform a stratified split or not","title":"RandomHoldoutSplit"},{"location":"api_modules/mlxtend.evaluate/RandomHoldoutSplit/#methods","text":"get_n_splits(X=None, y=None, groups=None) Returns the number of splitting iterations in the cross-validator Parameters X : object Always ignored, exists for compatibility. y : object Always ignored, exists for compatibility. groups : object Always ignored, exists for compatibility. Returns n_splits : 1 Returns the number of splitting iterations in the cross-validator. Always returns 1. split(X, y, groups=None) Generate indices to split data into training and test set. Parameters X : array-like, shape (num_examples, num_features) Training data, where num_examples is the number of training examples and num_features is the number of features. y : array-like, shape (num_examples,) The target variable for supervised learning problems. Stratification is done based on the y labels. groups : object Always ignored, exists for compatibility. Yields train_index : ndarray The training set indices for that split. valid_index : ndarray The validation set indices for that split.","title":"Methods"},{"location":"api_modules/mlxtend.evaluate/bootstrap/","text":"bootstrap bootstrap(x, func, num_rounds=1000, ci=0.95, ddof=1, seed=None) Implements the ordinary nonparametric bootstrap Parameters x : NumPy array, shape=(n_samples, [n_columns]) An one or multidimensional array of data records func : A function which computes a statistic that is used to compute the bootstrap replicates (the statistic computed from the bootstrap samples). This function must return a scalar value. For example, np.mean or np.median would be an acceptable argument for func if x is a 1-dimensional array or vector. num_rounds : int (default=1000) The number of bootstrap samnples to draw where each bootstrap sample has the same number of records as the original dataset. ci : int (default=0.95) An integer in the range (0, 1) that represents the confidence level for computing the confidence interval. For example, ci=0.95 (default) will compute the 95% confidence interval from the bootstrap replicates. ddof : int The delta degrees of freedom used when computing the standard error. seed : int or None (default=None) Random seed for generating bootstrap samples. Returns original, standard_error, (lower_ci, upper_ci) : tuple Returns the statistic of the original sample ( original ), the standard error of the estimate, and the respective confidence interval bounds. Examples >>> from mlxtend.evaluate import bootstrap >>> rng = np.random.RandomState(123) >>> x = rng.normal(loc=5., size=100) >>> original, std_err, ci_bounds = bootstrap(x, ... num_rounds=1000, ... func=np.mean, ... ci=0.95, ... seed=123) >>> print('Mean: %.2f, SE: +/- %.2f, CI95: [%.2f, %.2f]' % (original, ... std_err, ... ci_bounds[0], ... ci_bounds[1])) Mean: 5.03, SE: +/- 0.11, CI95: [4.80, 5.26] >>> For more usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/bootstrap/","title":"Bootstrap"},{"location":"api_modules/mlxtend.evaluate/bootstrap/#bootstrap","text":"bootstrap(x, func, num_rounds=1000, ci=0.95, ddof=1, seed=None) Implements the ordinary nonparametric bootstrap Parameters x : NumPy array, shape=(n_samples, [n_columns]) An one or multidimensional array of data records func : A function which computes a statistic that is used to compute the bootstrap replicates (the statistic computed from the bootstrap samples). This function must return a scalar value. For example, np.mean or np.median would be an acceptable argument for func if x is a 1-dimensional array or vector. num_rounds : int (default=1000) The number of bootstrap samnples to draw where each bootstrap sample has the same number of records as the original dataset. ci : int (default=0.95) An integer in the range (0, 1) that represents the confidence level for computing the confidence interval. For example, ci=0.95 (default) will compute the 95% confidence interval from the bootstrap replicates. ddof : int The delta degrees of freedom used when computing the standard error. seed : int or None (default=None) Random seed for generating bootstrap samples. Returns original, standard_error, (lower_ci, upper_ci) : tuple Returns the statistic of the original sample ( original ), the standard error of the estimate, and the respective confidence interval bounds. Examples >>> from mlxtend.evaluate import bootstrap >>> rng = np.random.RandomState(123) >>> x = rng.normal(loc=5., size=100) >>> original, std_err, ci_bounds = bootstrap(x, ... num_rounds=1000, ... func=np.mean, ... ci=0.95, ... seed=123) >>> print('Mean: %.2f, SE: +/- %.2f, CI95: [%.2f, %.2f]' % (original, ... std_err, ... ci_bounds[0], ... ci_bounds[1])) Mean: 5.03, SE: +/- 0.11, CI95: [4.80, 5.26] >>> For more usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/bootstrap/","title":"bootstrap"},{"location":"api_modules/mlxtend.evaluate/bootstrap_point632_score/","text":"bootstrap_point632_score bootstrap_point632_score(estimator, X, y, n_splits=200, method='.632', scoring_func=None, random_seed=None, clone_estimator=True) Implementation of the .632 [1] and .632+ [2] bootstrap for supervised learning References: [1] Efron, Bradley. 1983. \"Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation.\" Journal of the American Statistical Association 78 (382): 316. doi:10.2307/2288636. [2] Efron, Bradley, and Robert Tibshirani. 1997. \"Improvements on Cross-Validation: The .632+ Bootstrap Method.\" Journal of the American Statistical Association 92 (438): 548. doi:10.2307/2965703. Parameters estimator : object An estimator for classification or regression that follows the scikit-learn API and implements \"fit\" and \"predict\" methods. X : array-like The data to fit. Can be, for example a list, or an array at least 2d. y : array-like, optional, default: None The target variable to try to predict in the case of supervised learning. n_splits : int (default=200) Number of bootstrap iterations. Must be larger than 1. method : str (default='.632') The bootstrap method, which can be either - 1) '.632' bootstrap (default) - 2) '.632+' bootstrap - 3) 'oob' (regular out-of-bag, no weighting) for comparison studies. scoring_func : callable, Score function (or loss function) with signature scoring_func(y, y_pred, **kwargs) . If none, uses classification accuracy if the estimator is a classifier and mean squared error if the estimator is a regressor. random_seed : int (default=None) If int, random_seed is the seed used by the random number generator. clone_estimator : bool (default=True) Clones the estimator if true, otherwise fits the original. Returns scores : array of float, shape=(len(list(n_splits)),) Array of scores of the estimator for each bootstrap replicate. Examples >>> from sklearn import datasets, linear_model >>> from mlxtend.evaluate import bootstrap_point632_score >>> iris = datasets.load_iris() >>> X = iris.data >>> y = iris.target >>> lr = linear_model.LogisticRegression() >>> scores = bootstrap_point632_score(lr, X, y) >>> acc = np.mean(scores) >>> print('Accuracy:', acc) 0.953023146884 >>> lower = np.percentile(scores, 2.5) >>> upper = np.percentile(scores, 97.5) >>> print('95%% Confidence interval: [%.2f, %.2f]' % (lower, upper)) 95% Confidence interval: [0.90, 0.98] For more usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/bootstrap_point632_score/","title":"Bootstrap point632 score"},{"location":"api_modules/mlxtend.evaluate/bootstrap_point632_score/#bootstrap_point632_score","text":"bootstrap_point632_score(estimator, X, y, n_splits=200, method='.632', scoring_func=None, random_seed=None, clone_estimator=True) Implementation of the .632 [1] and .632+ [2] bootstrap for supervised learning References: [1] Efron, Bradley. 1983. \"Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation.\" Journal of the American Statistical Association 78 (382): 316. doi:10.2307/2288636. [2] Efron, Bradley, and Robert Tibshirani. 1997. \"Improvements on Cross-Validation: The .632+ Bootstrap Method.\" Journal of the American Statistical Association 92 (438): 548. doi:10.2307/2965703. Parameters estimator : object An estimator for classification or regression that follows the scikit-learn API and implements \"fit\" and \"predict\" methods. X : array-like The data to fit. Can be, for example a list, or an array at least 2d. y : array-like, optional, default: None The target variable to try to predict in the case of supervised learning. n_splits : int (default=200) Number of bootstrap iterations. Must be larger than 1. method : str (default='.632') The bootstrap method, which can be either - 1) '.632' bootstrap (default) - 2) '.632+' bootstrap - 3) 'oob' (regular out-of-bag, no weighting) for comparison studies. scoring_func : callable, Score function (or loss function) with signature scoring_func(y, y_pred, **kwargs) . If none, uses classification accuracy if the estimator is a classifier and mean squared error if the estimator is a regressor. random_seed : int (default=None) If int, random_seed is the seed used by the random number generator. clone_estimator : bool (default=True) Clones the estimator if true, otherwise fits the original. Returns scores : array of float, shape=(len(list(n_splits)),) Array of scores of the estimator for each bootstrap replicate. Examples >>> from sklearn import datasets, linear_model >>> from mlxtend.evaluate import bootstrap_point632_score >>> iris = datasets.load_iris() >>> X = iris.data >>> y = iris.target >>> lr = linear_model.LogisticRegression() >>> scores = bootstrap_point632_score(lr, X, y) >>> acc = np.mean(scores) >>> print('Accuracy:', acc) 0.953023146884 >>> lower = np.percentile(scores, 2.5) >>> upper = np.percentile(scores, 97.5) >>> print('95%% Confidence interval: [%.2f, %.2f]' % (lower, upper)) 95% Confidence interval: [0.90, 0.98] For more usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/bootstrap_point632_score/","title":"bootstrap_point632_score"},{"location":"api_modules/mlxtend.evaluate/cochrans_q/","text":"cochrans_q cochrans_q(y_target, y_model_predictions)* Cochran's Q test to compare 2 or more models. Parameters y_target : array-like, shape=[n_samples] True class labels as 1D NumPy array. *y_model_predictions : array-likes, shape=[n_samples] Variable number of 2 or more arrays that contain the predicted class labels from models as 1D NumPy array. Returns q, p : float or None, float Returns the Q (chi-squared) value and the p-value Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/cochrans_q/","title":"Cochrans q"},{"location":"api_modules/mlxtend.evaluate/cochrans_q/#cochrans_q","text":"cochrans_q(y_target, y_model_predictions)* Cochran's Q test to compare 2 or more models. Parameters y_target : array-like, shape=[n_samples] True class labels as 1D NumPy array. *y_model_predictions : array-likes, shape=[n_samples] Variable number of 2 or more arrays that contain the predicted class labels from models as 1D NumPy array. Returns q, p : float or None, float Returns the Q (chi-squared) value and the p-value Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/cochrans_q/","title":"cochrans_q"},{"location":"api_modules/mlxtend.evaluate/combined_ftest_5x2cv/","text":"combined_ftest_5x2cv combined_ftest_5x2cv(estimator1, estimator2, X, y, scoring=None, random_seed=None) Implements the 5x2cv combined F test proposed by Alpaydin 1999, to compare the performance of two models. Parameters estimator1 : scikit-learn classifier or regressor estimator2 : scikit-learn classifier or regressor X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. scoring : str, callable, or None (default: None) If None (default), uses 'accuracy' for sklearn classifiers and 'r2' for sklearn regressors. If str, uses a sklearn scoring metric string identifier, for example {accuracy, f1, precision, recall, roc_auc} for classifiers, {'mean_absolute_error', 'mean_squared_error'/'neg_mean_squared_error', 'median_absolute_error', 'r2'} for regressors. If a callable object or function is provided, it has to be conform with sklearn's signature scorer(estimator, X, y) ; see http://scikit-learn.org/stable/modules/generated/sklearn.metrics.make_scorer.html for more information. random_seed : int or None (default: None) Random seed for creating the test/train splits. Returns f : float The F-statistic pvalue : float Two-tailed p-value. If the chosen significance level is larger than the p-value, we reject the null hypothesis and accept that there are significant differences in the two compared models. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/combined_ftest_5x2cv/","title":"Combined ftest 5x2cv"},{"location":"api_modules/mlxtend.evaluate/combined_ftest_5x2cv/#combined_ftest_5x2cv","text":"combined_ftest_5x2cv(estimator1, estimator2, X, y, scoring=None, random_seed=None) Implements the 5x2cv combined F test proposed by Alpaydin 1999, to compare the performance of two models. Parameters estimator1 : scikit-learn classifier or regressor estimator2 : scikit-learn classifier or regressor X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. scoring : str, callable, or None (default: None) If None (default), uses 'accuracy' for sklearn classifiers and 'r2' for sklearn regressors. If str, uses a sklearn scoring metric string identifier, for example {accuracy, f1, precision, recall, roc_auc} for classifiers, {'mean_absolute_error', 'mean_squared_error'/'neg_mean_squared_error', 'median_absolute_error', 'r2'} for regressors. If a callable object or function is provided, it has to be conform with sklearn's signature scorer(estimator, X, y) ; see http://scikit-learn.org/stable/modules/generated/sklearn.metrics.make_scorer.html for more information. random_seed : int or None (default: None) Random seed for creating the test/train splits. Returns f : float The F-statistic pvalue : float Two-tailed p-value. If the chosen significance level is larger than the p-value, we reject the null hypothesis and accept that there are significant differences in the two compared models. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/combined_ftest_5x2cv/","title":"combined_ftest_5x2cv"},{"location":"api_modules/mlxtend.evaluate/confusion_matrix/","text":"confusion_matrix confusion_matrix(y_target, y_predicted, binary=False, positive_label=1) Compute a confusion matrix/contingency table. Parameters y_target : array-like, shape=[n_samples] True class labels. y_predicted : array-like, shape=[n_samples] Predicted class labels. binary : bool (default: False) Maps a multi-class problem onto a binary confusion matrix, where the positive class is 1 and all other classes are 0. positive_label : int (default: 1) Class label of the positive class. Returns mat : array-like, shape=[n_classes, n_classes] Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/confusion_matrix/","title":"Confusion matrix"},{"location":"api_modules/mlxtend.evaluate/confusion_matrix/#confusion_matrix","text":"confusion_matrix(y_target, y_predicted, binary=False, positive_label=1) Compute a confusion matrix/contingency table. Parameters y_target : array-like, shape=[n_samples] True class labels. y_predicted : array-like, shape=[n_samples] Predicted class labels. binary : bool (default: False) Maps a multi-class problem onto a binary confusion matrix, where the positive class is 1 and all other classes are 0. positive_label : int (default: 1) Class label of the positive class. Returns mat : array-like, shape=[n_classes, n_classes] Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/confusion_matrix/","title":"confusion_matrix"},{"location":"api_modules/mlxtend.evaluate/feature_importance_permutation/","text":"feature_importance_permutation feature_importance_permutation(X, y, predict_method, metric, num_rounds=1, seed=None) Feature importance imputation via permutation importance Parameters X : NumPy array, shape = [n_samples, n_features] Dataset, where n_samples is the number of samples and n_features is the number of features. y : NumPy array, shape = [n_samples] Target values. predict_method : prediction function A callable function that predicts the target values from X. metric : str, callable The metric for evaluating the feature importance through permutation. By default, the strings 'accuracy' is recommended for classifiers and the string 'r2' is recommended for regressors. Optionally, a custom scoring function (e.g., metric=scoring_func ) that accepts two arguments, y_true and y_pred, which have similar shape to the y array. num_rounds : int (default=1) Number of rounds the feature columns are permuted to compute the permutation importance. seed : int or None (default=None) Random seed for permuting the feature columns. Returns mean_importance_vals, all_importance_vals : NumPy arrays. The first array, mean_importance_vals has shape [n_features, ] and contains the importance values for all features. The shape of the second array is [n_features, num_rounds] and contains the feature importance for each repetition. If num_rounds=1, it contains the same values as the first array, mean_importance_vals. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/feature_importance_permutation/","title":"Feature importance permutation"},{"location":"api_modules/mlxtend.evaluate/feature_importance_permutation/#feature_importance_permutation","text":"feature_importance_permutation(X, y, predict_method, metric, num_rounds=1, seed=None) Feature importance imputation via permutation importance Parameters X : NumPy array, shape = [n_samples, n_features] Dataset, where n_samples is the number of samples and n_features is the number of features. y : NumPy array, shape = [n_samples] Target values. predict_method : prediction function A callable function that predicts the target values from X. metric : str, callable The metric for evaluating the feature importance through permutation. By default, the strings 'accuracy' is recommended for classifiers and the string 'r2' is recommended for regressors. Optionally, a custom scoring function (e.g., metric=scoring_func ) that accepts two arguments, y_true and y_pred, which have similar shape to the y array. num_rounds : int (default=1) Number of rounds the feature columns are permuted to compute the permutation importance. seed : int or None (default=None) Random seed for permuting the feature columns. Returns mean_importance_vals, all_importance_vals : NumPy arrays. The first array, mean_importance_vals has shape [n_features, ] and contains the importance values for all features. The shape of the second array is [n_features, num_rounds] and contains the feature importance for each repetition. If num_rounds=1, it contains the same values as the first array, mean_importance_vals. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/feature_importance_permutation/","title":"feature_importance_permutation"},{"location":"api_modules/mlxtend.evaluate/ftest/","text":"ftest ftest(y_target, y_model_predictions)* F-Test test to compare 2 or more models. Parameters y_target : array-like, shape=[n_samples] True class labels as 1D NumPy array. *y_model_predictions : array-likes, shape=[n_samples] Variable number of 2 or more arrays that contain the predicted class labels from models as 1D NumPy array. Returns f, p : float or None, float Returns the F-value and the p-value Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/ftest/","title":"Ftest"},{"location":"api_modules/mlxtend.evaluate/ftest/#ftest","text":"ftest(y_target, y_model_predictions)* F-Test test to compare 2 or more models. Parameters y_target : array-like, shape=[n_samples] True class labels as 1D NumPy array. *y_model_predictions : array-likes, shape=[n_samples] Variable number of 2 or more arrays that contain the predicted class labels from models as 1D NumPy array. Returns f, p : float or None, float Returns the F-value and the p-value Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/ftest/","title":"ftest"},{"location":"api_modules/mlxtend.evaluate/lift_score/","text":"lift_score lift_score(y_target, y_predicted, binary=True, positive_label=1) Lift measures the degree to which the predictions of a classification model are better than randomly-generated predictions. The in terms of True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN), the lift score is computed as: [ TP / (TP+FP) ] / [ (TP+FN) / (TP+TN+FP+FN) ] Parameters y_target : array-like, shape=[n_samples] True class labels. y_predicted : array-like, shape=[n_samples] Predicted class labels. binary : bool (default: True) Maps a multi-class problem onto a binary, where the positive class is 1 and all other classes are 0. positive_label : int (default: 0) Class label of the positive class. Returns score : float Lift score in the range [0, \\infty ] Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/lift_score/","title":"Lift score"},{"location":"api_modules/mlxtend.evaluate/lift_score/#lift_score","text":"lift_score(y_target, y_predicted, binary=True, positive_label=1) Lift measures the degree to which the predictions of a classification model are better than randomly-generated predictions. The in terms of True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN), the lift score is computed as: [ TP / (TP+FP) ] / [ (TP+FN) / (TP+TN+FP+FN) ] Parameters y_target : array-like, shape=[n_samples] True class labels. y_predicted : array-like, shape=[n_samples] Predicted class labels. binary : bool (default: True) Maps a multi-class problem onto a binary, where the positive class is 1 and all other classes are 0. positive_label : int (default: 0) Class label of the positive class. Returns score : float Lift score in the range [0, \\infty ] Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/lift_score/","title":"lift_score"},{"location":"api_modules/mlxtend.evaluate/mcnemar/","text":"mcnemar mcnemar(ary, corrected=True, exact=False) McNemar test for paired nominal data Parameters ary : array-like, shape=[2, 2] 2 x 2 contigency table (as returned by evaluate.mcnemar_table), where a: ary[0, 0]: # of samples that both models predicted correctly b: ary[0, 1]: # of samples that model 1 got right and model 2 got wrong c: ary[1, 0]: # of samples that model 2 got right and model 1 got wrong d: aryCell [1, 1]: # of samples that both models predicted incorrectly corrected : array-like, shape=[n_samples] (default: True) Uses Edward's continuity correction for chi-squared if True exact : bool, (default: False) If True , uses an exact binomial test comparing b to a binomial distribution with n = b + c and p = 0.5. It is highly recommended to use exact=True for sample sizes < 25 since chi-squared is not well-approximated by the chi-squared distribution! Returns chi2, p : float or None, float Returns the chi-squared value and the p-value; if exact=True (default: False ), chi2 is None Examples For usage examples, please see [http://rasbt.github.io/mlxtend/user_guide/evaluate/mcnemar/](http://rasbt.github.io/mlxtend/user_guide/evaluate/mcnemar/)","title":"Mcnemar"},{"location":"api_modules/mlxtend.evaluate/mcnemar/#mcnemar","text":"mcnemar(ary, corrected=True, exact=False) McNemar test for paired nominal data Parameters ary : array-like, shape=[2, 2] 2 x 2 contigency table (as returned by evaluate.mcnemar_table), where a: ary[0, 0]: # of samples that both models predicted correctly b: ary[0, 1]: # of samples that model 1 got right and model 2 got wrong c: ary[1, 0]: # of samples that model 2 got right and model 1 got wrong d: aryCell [1, 1]: # of samples that both models predicted incorrectly corrected : array-like, shape=[n_samples] (default: True) Uses Edward's continuity correction for chi-squared if True exact : bool, (default: False) If True , uses an exact binomial test comparing b to a binomial distribution with n = b + c and p = 0.5. It is highly recommended to use exact=True for sample sizes < 25 since chi-squared is not well-approximated by the chi-squared distribution! Returns chi2, p : float or None, float Returns the chi-squared value and the p-value; if exact=True (default: False ), chi2 is None Examples For usage examples, please see [http://rasbt.github.io/mlxtend/user_guide/evaluate/mcnemar/](http://rasbt.github.io/mlxtend/user_guide/evaluate/mcnemar/)","title":"mcnemar"},{"location":"api_modules/mlxtend.evaluate/mcnemar_table/","text":"mcnemar_table mcnemar_table(y_target, y_model1, y_model2) Compute a 2x2 contigency table for McNemar's test. Parameters y_target : array-like, shape=[n_samples] True class labels as 1D NumPy array. y_model1 : array-like, shape=[n_samples] Predicted class labels from model as 1D NumPy array. y_model2 : array-like, shape=[n_samples] Predicted class labels from model 2 as 1D NumPy array. Returns tb : array-like, shape=[2, 2] 2x2 contingency table with the following contents: a: tb[0, 0]: # of samples that both models predicted correctly b: tb[0, 1]: # of samples that model 1 got right and model 2 got wrong c: tb[1, 0]: # of samples that model 2 got right and model 1 got wrong d: tb[1, 1]: # of samples that both models predicted incorrectly Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/mcnemar_table/","title":"Mcnemar table"},{"location":"api_modules/mlxtend.evaluate/mcnemar_table/#mcnemar_table","text":"mcnemar_table(y_target, y_model1, y_model2) Compute a 2x2 contigency table for McNemar's test. Parameters y_target : array-like, shape=[n_samples] True class labels as 1D NumPy array. y_model1 : array-like, shape=[n_samples] Predicted class labels from model as 1D NumPy array. y_model2 : array-like, shape=[n_samples] Predicted class labels from model 2 as 1D NumPy array. Returns tb : array-like, shape=[2, 2] 2x2 contingency table with the following contents: a: tb[0, 0]: # of samples that both models predicted correctly b: tb[0, 1]: # of samples that model 1 got right and model 2 got wrong c: tb[1, 0]: # of samples that model 2 got right and model 1 got wrong d: tb[1, 1]: # of samples that both models predicted incorrectly Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/mcnemar_table/","title":"mcnemar_table"},{"location":"api_modules/mlxtend.evaluate/mcnemar_tables/","text":"mcnemar_tables mcnemar_tables(y_target, y_model_predictions)* Compute multiple 2x2 contigency tables for McNemar's test or Cochran's Q test. Parameters y_target : array-like, shape=[n_samples] True class labels as 1D NumPy array. y_model_predictions : array-like, shape=[n_samples] Predicted class labels for a model. Returns tables : dict Dictionary of NumPy arrays with shape=[2, 2]. Each dictionary key names the two models to be compared based on the order the models were passed as *y_model_predictions . The number of dictionary entries is equal to the number of pairwise combinations between the m models, i.e., \"m choose 2.\" For example the following target array (containing the true labels) and 3 models y_true = np.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1]) y_mod0 = np.array([0, 1, 0, 0, 0, 1, 1, 0, 0, 0]) y_mod0 = np.array([0, 0, 1, 1, 0, 1, 1, 0, 0, 0]) y_mod0 = np.array([0, 1, 1, 1, 0, 1, 0, 0, 0, 0]) would result in the following dictionary: {'model_0 vs model_1': array([[ 4., 1.], [ 2., 3.]]), 'model_0 vs model_2': array([[ 3., 0.], [ 3., 4.]]), 'model_1 vs model_2': array([[ 3., 0.], [ 2., 5.]])} Each array is structured in the following way: tb[0, 0]: # of samples that both models predicted correctly tb[0, 1]: # of samples that model a got right and model b got wrong tb[1, 0]: # of samples that model b got right and model a got wrong tb[1, 1]: # of samples that both models predicted incorrectly Examples For usage examples, please see [http://rasbt.github.io/mlxtend/user_guide/evaluate/mcnemar_tables/](http://rasbt.github.io/mlxtend/user_guide/evaluate/mcnemar_tables/)","title":"Mcnemar tables"},{"location":"api_modules/mlxtend.evaluate/mcnemar_tables/#mcnemar_tables","text":"mcnemar_tables(y_target, y_model_predictions)* Compute multiple 2x2 contigency tables for McNemar's test or Cochran's Q test. Parameters y_target : array-like, shape=[n_samples] True class labels as 1D NumPy array. y_model_predictions : array-like, shape=[n_samples] Predicted class labels for a model. Returns tables : dict Dictionary of NumPy arrays with shape=[2, 2]. Each dictionary key names the two models to be compared based on the order the models were passed as *y_model_predictions . The number of dictionary entries is equal to the number of pairwise combinations between the m models, i.e., \"m choose 2.\" For example the following target array (containing the true labels) and 3 models y_true = np.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1]) y_mod0 = np.array([0, 1, 0, 0, 0, 1, 1, 0, 0, 0]) y_mod0 = np.array([0, 0, 1, 1, 0, 1, 1, 0, 0, 0]) y_mod0 = np.array([0, 1, 1, 1, 0, 1, 0, 0, 0, 0]) would result in the following dictionary: {'model_0 vs model_1': array([[ 4., 1.], [ 2., 3.]]), 'model_0 vs model_2': array([[ 3., 0.], [ 3., 4.]]), 'model_1 vs model_2': array([[ 3., 0.], [ 2., 5.]])} Each array is structured in the following way: tb[0, 0]: # of samples that both models predicted correctly tb[0, 1]: # of samples that model a got right and model b got wrong tb[1, 0]: # of samples that model b got right and model a got wrong tb[1, 1]: # of samples that both models predicted incorrectly Examples For usage examples, please see [http://rasbt.github.io/mlxtend/user_guide/evaluate/mcnemar_tables/](http://rasbt.github.io/mlxtend/user_guide/evaluate/mcnemar_tables/)","title":"mcnemar_tables"},{"location":"api_modules/mlxtend.evaluate/paired_ttest_5x2cv/","text":"paired_ttest_5x2cv paired_ttest_5x2cv(estimator1, estimator2, X, y, scoring=None, random_seed=None) Implements the 5x2cv paired t test proposed by Dieterrich (1998) to compare the performance of two models. Parameters estimator1 : scikit-learn classifier or regressor estimator2 : scikit-learn classifier or regressor X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. scoring : str, callable, or None (default: None) If None (default), uses 'accuracy' for sklearn classifiers and 'r2' for sklearn regressors. If str, uses a sklearn scoring metric string identifier, for example {accuracy, f1, precision, recall, roc_auc} for classifiers, {'mean_absolute_error', 'mean_squared_error'/'neg_mean_squared_error', 'median_absolute_error', 'r2'} for regressors. If a callable object or function is provided, it has to be conform with sklearn's signature scorer(estimator, X, y) ; see http://scikit-learn.org/stable/modules/generated/sklearn.metrics.make_scorer.html for more information. random_seed : int or None (default: None) Random seed for creating the test/train splits. Returns t : float The t-statistic pvalue : float Two-tailed p-value. If the chosen significance level is larger than the p-value, we reject the null hypothesis and accept that there are significant differences in the two compared models. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/paired_ttest_5x2cv/","title":"Paired ttest 5x2cv"},{"location":"api_modules/mlxtend.evaluate/paired_ttest_5x2cv/#paired_ttest_5x2cv","text":"paired_ttest_5x2cv(estimator1, estimator2, X, y, scoring=None, random_seed=None) Implements the 5x2cv paired t test proposed by Dieterrich (1998) to compare the performance of two models. Parameters estimator1 : scikit-learn classifier or regressor estimator2 : scikit-learn classifier or regressor X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. scoring : str, callable, or None (default: None) If None (default), uses 'accuracy' for sklearn classifiers and 'r2' for sklearn regressors. If str, uses a sklearn scoring metric string identifier, for example {accuracy, f1, precision, recall, roc_auc} for classifiers, {'mean_absolute_error', 'mean_squared_error'/'neg_mean_squared_error', 'median_absolute_error', 'r2'} for regressors. If a callable object or function is provided, it has to be conform with sklearn's signature scorer(estimator, X, y) ; see http://scikit-learn.org/stable/modules/generated/sklearn.metrics.make_scorer.html for more information. random_seed : int or None (default: None) Random seed for creating the test/train splits. Returns t : float The t-statistic pvalue : float Two-tailed p-value. If the chosen significance level is larger than the p-value, we reject the null hypothesis and accept that there are significant differences in the two compared models. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/paired_ttest_5x2cv/","title":"paired_ttest_5x2cv"},{"location":"api_modules/mlxtend.evaluate/paired_ttest_kfold_cv/","text":"paired_ttest_kfold_cv paired_ttest_kfold_cv(estimator1, estimator2, X, y, cv=10, scoring=None, shuffle=False, random_seed=None) Implements the k-fold paired t test procedure to compare the performance of two models. Parameters estimator1 : scikit-learn classifier or regressor estimator2 : scikit-learn classifier or regressor X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. cv : int (default: 10) Number of splits and iteration for the cross-validation procedure scoring : str, callable, or None (default: None) If None (default), uses 'accuracy' for sklearn classifiers and 'r2' for sklearn regressors. If str, uses a sklearn scoring metric string identifier, for example {accuracy, f1, precision, recall, roc_auc} for classifiers, {'mean_absolute_error', 'mean_squared_error'/'neg_mean_squared_error', 'median_absolute_error', 'r2'} for regressors. If a callable object or function is provided, it has to be conform with sklearn's signature scorer(estimator, X, y) ; see http://scikit-learn.org/stable/modules/generated/sklearn.metrics.make_scorer.html for more information. shuffle : bool (default: True) Whether to shuffle the dataset for generating the k-fold splits. random_seed : int or None (default: None) Random seed for shuffling the dataset for generating the k-fold splits. Ignored if shuffle=False. Returns t : float The t-statistic pvalue : float Two-tailed p-value. If the chosen significance level is larger than the p-value, we reject the null hypothesis and accept that there are significant differences in the two compared models. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/paired_ttest_kfold_cv/","title":"Paired ttest kfold cv"},{"location":"api_modules/mlxtend.evaluate/paired_ttest_kfold_cv/#paired_ttest_kfold_cv","text":"paired_ttest_kfold_cv(estimator1, estimator2, X, y, cv=10, scoring=None, shuffle=False, random_seed=None) Implements the k-fold paired t test procedure to compare the performance of two models. Parameters estimator1 : scikit-learn classifier or regressor estimator2 : scikit-learn classifier or regressor X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. cv : int (default: 10) Number of splits and iteration for the cross-validation procedure scoring : str, callable, or None (default: None) If None (default), uses 'accuracy' for sklearn classifiers and 'r2' for sklearn regressors. If str, uses a sklearn scoring metric string identifier, for example {accuracy, f1, precision, recall, roc_auc} for classifiers, {'mean_absolute_error', 'mean_squared_error'/'neg_mean_squared_error', 'median_absolute_error', 'r2'} for regressors. If a callable object or function is provided, it has to be conform with sklearn's signature scorer(estimator, X, y) ; see http://scikit-learn.org/stable/modules/generated/sklearn.metrics.make_scorer.html for more information. shuffle : bool (default: True) Whether to shuffle the dataset for generating the k-fold splits. random_seed : int or None (default: None) Random seed for shuffling the dataset for generating the k-fold splits. Ignored if shuffle=False. Returns t : float The t-statistic pvalue : float Two-tailed p-value. If the chosen significance level is larger than the p-value, we reject the null hypothesis and accept that there are significant differences in the two compared models. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/paired_ttest_kfold_cv/","title":"paired_ttest_kfold_cv"},{"location":"api_modules/mlxtend.evaluate/paired_ttest_resampled/","text":"paired_ttest_resampled paired_ttest_resampled(estimator1, estimator2, X, y, num_rounds=30, test_size=0.3, scoring=None, random_seed=None) Implements the resampled paired t test procedure to compare the performance of two models (also called k-hold-out paired t test). Parameters estimator1 : scikit-learn classifier or regressor estimator2 : scikit-learn classifier or regressor X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. num_rounds : int (default: 30) Number of resampling iterations (i.e., train/test splits) test_size : float or int (default: 0.3) If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to use as a test set. If int, represents the absolute number of test exsamples. scoring : str, callable, or None (default: None) If None (default), uses 'accuracy' for sklearn classifiers and 'r2' for sklearn regressors. If str, uses a sklearn scoring metric string identifier, for example {accuracy, f1, precision, recall, roc_auc} for classifiers, {'mean_absolute_error', 'mean_squared_error'/'neg_mean_squared_error', 'median_absolute_error', 'r2'} for regressors. If a callable object or function is provided, it has to be conform with sklearn's signature scorer(estimator, X, y) ; see http://scikit-learn.org/stable/modules/generated/sklearn.metrics.make_scorer.html for more information. random_seed : int or None (default: None) Random seed for creating the test/train splits. Returns t : float The t-statistic pvalue : float Two-tailed p-value. If the chosen significance level is larger than the p-value, we reject the null hypothesis and accept that there are significant differences in the two compared models. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/paired_ttest_resampled/","title":"Paired ttest resampled"},{"location":"api_modules/mlxtend.evaluate/paired_ttest_resampled/#paired_ttest_resampled","text":"paired_ttest_resampled(estimator1, estimator2, X, y, num_rounds=30, test_size=0.3, scoring=None, random_seed=None) Implements the resampled paired t test procedure to compare the performance of two models (also called k-hold-out paired t test). Parameters estimator1 : scikit-learn classifier or regressor estimator2 : scikit-learn classifier or regressor X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. num_rounds : int (default: 30) Number of resampling iterations (i.e., train/test splits) test_size : float or int (default: 0.3) If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to use as a test set. If int, represents the absolute number of test exsamples. scoring : str, callable, or None (default: None) If None (default), uses 'accuracy' for sklearn classifiers and 'r2' for sklearn regressors. If str, uses a sklearn scoring metric string identifier, for example {accuracy, f1, precision, recall, roc_auc} for classifiers, {'mean_absolute_error', 'mean_squared_error'/'neg_mean_squared_error', 'median_absolute_error', 'r2'} for regressors. If a callable object or function is provided, it has to be conform with sklearn's signature scorer(estimator, X, y) ; see http://scikit-learn.org/stable/modules/generated/sklearn.metrics.make_scorer.html for more information. random_seed : int or None (default: None) Random seed for creating the test/train splits. Returns t : float The t-statistic pvalue : float Two-tailed p-value. If the chosen significance level is larger than the p-value, we reject the null hypothesis and accept that there are significant differences in the two compared models. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/paired_ttest_resampled/","title":"paired_ttest_resampled"},{"location":"api_modules/mlxtend.evaluate/permutation_test/","text":"permutation_test permutation_test(x, y, func='x_mean != y_mean', method='exact', num_rounds=1000, seed=None) Nonparametric permutation test Parameters x : list or numpy array with shape (n_datapoints,) A list or 1D numpy array of the first sample (e.g., the treatment group). y : list or numpy array with shape (n_datapoints,) A list or 1D numpy array of the second sample (e.g., the control group). func : custom function or str (default: 'x_mean != y_mean') function to compute the statistic for the permutation test. - If 'x_mean != y_mean', uses func=lambda x, y: np.abs(np.mean(x) - np.mean(y))) for a two-sided test. - If 'x_mean > y_mean', uses func=lambda x, y: np.mean(x) - np.mean(y)) for a one-sided test. - If 'x_mean < y_mean', uses func=lambda x, y: np.mean(y) - np.mean(x)) for a one-sided test. method : 'approximate' or 'exact' (default: 'exact') If 'exact' (default), all possible permutations are considered. If 'approximate' the number of drawn samples is given by num_rounds . Note that 'exact' is typically not feasible unless the dataset size is relatively small. num_rounds : int (default: 1000) The number of permutation samples if method='approximate' . seed : int or None (default: None) The random seed for generating permutation samples if method='approximate' . Returns p-value under the null hypothesis Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/permutation_test/","title":"Permutation test"},{"location":"api_modules/mlxtend.evaluate/permutation_test/#permutation_test","text":"permutation_test(x, y, func='x_mean != y_mean', method='exact', num_rounds=1000, seed=None) Nonparametric permutation test Parameters x : list or numpy array with shape (n_datapoints,) A list or 1D numpy array of the first sample (e.g., the treatment group). y : list or numpy array with shape (n_datapoints,) A list or 1D numpy array of the second sample (e.g., the control group). func : custom function or str (default: 'x_mean != y_mean') function to compute the statistic for the permutation test. - If 'x_mean != y_mean', uses func=lambda x, y: np.abs(np.mean(x) - np.mean(y))) for a two-sided test. - If 'x_mean > y_mean', uses func=lambda x, y: np.mean(x) - np.mean(y)) for a one-sided test. - If 'x_mean < y_mean', uses func=lambda x, y: np.mean(y) - np.mean(x)) for a one-sided test. method : 'approximate' or 'exact' (default: 'exact') If 'exact' (default), all possible permutations are considered. If 'approximate' the number of drawn samples is given by num_rounds . Note that 'exact' is typically not feasible unless the dataset size is relatively small. num_rounds : int (default: 1000) The number of permutation samples if method='approximate' . seed : int or None (default: None) The random seed for generating permutation samples if method='approximate' . Returns p-value under the null hypothesis Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/permutation_test/","title":"permutation_test"},{"location":"api_modules/mlxtend.evaluate/proportion_difference/","text":"proportion_difference proportion_difference(proportion_1, proportion_2, n_1, n_2=None) Computes the test statistic and p-value for a difference of proportions test. Parameters proportion_1 : float The first proportion proportion_2 : float The second proportion n_1 : int The sample size of the first test sample n_2 : int or None (default=None) The sample size of the second test sample. If None , n_1 = n_2 . Returns z, p : float or None, float Returns the z-score and the p-value Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/proportion_difference/","title":"Proportion difference"},{"location":"api_modules/mlxtend.evaluate/proportion_difference/#proportion_difference","text":"proportion_difference(proportion_1, proportion_2, n_1, n_2=None) Computes the test statistic and p-value for a difference of proportions test. Parameters proportion_1 : float The first proportion proportion_2 : float The second proportion n_1 : int The sample size of the first test sample n_2 : int or None (default=None) The sample size of the second test sample. If None , n_1 = n_2 . Returns z, p : float or None, float Returns the z-score and the p-value Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/proportion_difference/","title":"proportion_difference"},{"location":"api_modules/mlxtend.evaluate/scoring/","text":"scoring scoring(y_target, y_predicted, metric='error', positive_label=1, unique_labels='auto') Compute a scoring metric for supervised learning. Parameters y_target : array-like, shape=[n_values] True class labels or target values. y_predicted : array-like, shape=[n_values] Predicted class labels or target values. metric : str (default: 'error') Performance metric: 'accuracy': (TP + TN)/(FP + FN + TP + TN) = 1-ERR 'per-class accuracy': Average per-class accuracy 'per-class error': Average per-class error 'error': (TP + TN)/(FP+ FN + TP + TN) = 1-ACC 'false_positive_rate': FP/N = FP/(FP + TN) 'true_positive_rate': TP/P = TP/(FN + TP) 'true_negative_rate': TN/N = TN/(FP + TN) 'precision': TP/(TP + FP) 'recall': equal to 'true_positive_rate' 'sensitivity': equal to 'true_positive_rate' or 'recall' 'specificity': equal to 'true_negative_rate' 'f1': 2 * (PRE * REC)/(PRE + REC) 'matthews_corr_coef': (TP TN - FP FN) / (sqrt{(TP + FP)( TP + FN )( TN + FP )( TN + FN )}) Where: [TP: True positives, TN = True negatives, TN: True negatives, FN = False negatives] positive_label : int (default: 1) Label of the positive class for binary classification metrics. unique_labels : str or array-like (default: 'auto') If 'auto', deduces the unique class labels from y_target Returns score : float Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/scoring/","title":"Scoring"},{"location":"api_modules/mlxtend.evaluate/scoring/#scoring","text":"scoring(y_target, y_predicted, metric='error', positive_label=1, unique_labels='auto') Compute a scoring metric for supervised learning. Parameters y_target : array-like, shape=[n_values] True class labels or target values. y_predicted : array-like, shape=[n_values] Predicted class labels or target values. metric : str (default: 'error') Performance metric: 'accuracy': (TP + TN)/(FP + FN + TP + TN) = 1-ERR 'per-class accuracy': Average per-class accuracy 'per-class error': Average per-class error 'error': (TP + TN)/(FP+ FN + TP + TN) = 1-ACC 'false_positive_rate': FP/N = FP/(FP + TN) 'true_positive_rate': TP/P = TP/(FN + TP) 'true_negative_rate': TN/N = TN/(FP + TN) 'precision': TP/(TP + FP) 'recall': equal to 'true_positive_rate' 'sensitivity': equal to 'true_positive_rate' or 'recall' 'specificity': equal to 'true_negative_rate' 'f1': 2 * (PRE * REC)/(PRE + REC) 'matthews_corr_coef': (TP TN - FP FN) / (sqrt{(TP + FP)( TP + FN )( TN + FP )( TN + FN )}) Where: [TP: True positives, TN = True negatives, TN: True negatives, FN = False negatives] positive_label : int (default: 1) Label of the positive class for binary classification metrics. unique_labels : str or array-like (default: 'auto') If 'auto', deduces the unique class labels from y_target Returns score : float Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/scoring/","title":"scoring"},{"location":"api_modules/mlxtend.feature_extraction/LinearDiscriminantAnalysis/","text":"LinearDiscriminantAnalysis LinearDiscriminantAnalysis(n_discriminants=None) Linear Discriminant Analysis Class Parameters n_discriminants : int (default: None) The number of discrimants for transformation. Keeps the original dimensions of the dataset if None . Attributes w_ : array-like, shape=[n_features, n_discriminants] Projection matrix e_vals_ : array-like, shape=[n_features] Eigenvalues in sorted order. e_vecs_ : array-like, shape=[n_features] Eigenvectors in sorted order. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/feature_extraction/LinearDiscriminantAnalysis/ Methods fit(X, y, n_classes=None) Fit the LDA model with X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. n_classes : int (default: None) A positive integer to declare the number of class labels if not all class labels are present in a partial training set. Gets the number of class labels automatically if None. Returns self : object get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values.' adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py Author: Gael Varoquaux gael.varoquaux@normalesup.org License: BSD 3 clause set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py Author: Gael Varoquaux gael.varoquaux@normalesup.org License: BSD 3 clause transform(X) Apply the linear transformation on X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns X_projected : np.ndarray, shape = [n_samples, n_discriminants] Projected training vectors.","title":"LinearDiscriminantAnalysis"},{"location":"api_modules/mlxtend.feature_extraction/LinearDiscriminantAnalysis/#lineardiscriminantanalysis","text":"LinearDiscriminantAnalysis(n_discriminants=None) Linear Discriminant Analysis Class Parameters n_discriminants : int (default: None) The number of discrimants for transformation. Keeps the original dimensions of the dataset if None . Attributes w_ : array-like, shape=[n_features, n_discriminants] Projection matrix e_vals_ : array-like, shape=[n_features] Eigenvalues in sorted order. e_vecs_ : array-like, shape=[n_features] Eigenvectors in sorted order. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/feature_extraction/LinearDiscriminantAnalysis/","title":"LinearDiscriminantAnalysis"},{"location":"api_modules/mlxtend.feature_extraction/LinearDiscriminantAnalysis/#methods","text":"fit(X, y, n_classes=None) Fit the LDA model with X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. n_classes : int (default: None) A positive integer to declare the number of class labels if not all class labels are present in a partial training set. Gets the number of class labels automatically if None. Returns self : object get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values.' adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py","title":"Methods"},{"location":"api_modules/mlxtend.feature_extraction/LinearDiscriminantAnalysis/#author-gael-varoquaux-amp103amp97amp101amp108amp46amp118amp97amp114amp111amp113amp117amp97amp117amp120amp64amp110amp111amp114amp109amp97amp108amp101amp115amp117amp112amp46amp111amp114amp103","text":"","title":"Author: Gael Varoquaux gael.varoquaux@normalesup.org"},{"location":"api_modules/mlxtend.feature_extraction/LinearDiscriminantAnalysis/#license-bsd-3-clause","text":"set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py","title":"License: BSD 3 clause"},{"location":"api_modules/mlxtend.feature_extraction/LinearDiscriminantAnalysis/#author-gael-varoquaux-amp103amp97amp101amp108amp46amp118amp97amp114amp111amp113amp117amp97amp117amp120amp64amp110amp111amp114amp109amp97amp108amp101amp115amp117amp112amp46amp111amp114amp103_1","text":"","title":"Author: Gael Varoquaux gael.varoquaux@normalesup.org"},{"location":"api_modules/mlxtend.feature_extraction/LinearDiscriminantAnalysis/#license-bsd-3-clause_1","text":"transform(X) Apply the linear transformation on X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns X_projected : np.ndarray, shape = [n_samples, n_discriminants] Projected training vectors.","title":"License: BSD 3 clause"},{"location":"api_modules/mlxtend.feature_extraction/PrincipalComponentAnalysis/","text":"PrincipalComponentAnalysis PrincipalComponentAnalysis(n_components=None, solver='eigen') Principal Component Analysis Class Parameters n_components : int (default: None) The number of principal components for transformation. Keeps the original dimensions of the dataset if None . solver : str (default: 'eigen') Method for performing the matrix decomposition. {'eigen', 'svd'} Attributes w_ : array-like, shape=[n_features, n_components] Projection matrix e_vals_ : array-like, shape=[n_features] Eigenvalues in sorted order. e_vecs_ : array-like, shape=[n_features] Eigenvectors in sorted order. loadings_ : array_like, shape=[n_features, n_features] The factor loadings of the original variables onto the principal components. The columns are the principal components, and the rows are the features loadings. For instance, the first column contains the loadings onto the first principal component. Note that the signs may be flipped depending on whether you use the 'eigen' or 'svd' solver; this does not affect the interpretation of the loadings though. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/feature_extraction/PrincipalComponentAnalysis/ Methods fit(X) Learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns self : object get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values.' adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py Author: Gael Varoquaux gael.varoquaux@normalesup.org License: BSD 3 clause set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py Author: Gael Varoquaux gael.varoquaux@normalesup.org License: BSD 3 clause transform(X) Apply the linear transformation on X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns X_projected : np.ndarray, shape = [n_samples, n_components] Projected training vectors.","title":"PrincipalComponentAnalysis"},{"location":"api_modules/mlxtend.feature_extraction/PrincipalComponentAnalysis/#principalcomponentanalysis","text":"PrincipalComponentAnalysis(n_components=None, solver='eigen') Principal Component Analysis Class Parameters n_components : int (default: None) The number of principal components for transformation. Keeps the original dimensions of the dataset if None . solver : str (default: 'eigen') Method for performing the matrix decomposition. {'eigen', 'svd'} Attributes w_ : array-like, shape=[n_features, n_components] Projection matrix e_vals_ : array-like, shape=[n_features] Eigenvalues in sorted order. e_vecs_ : array-like, shape=[n_features] Eigenvectors in sorted order. loadings_ : array_like, shape=[n_features, n_features] The factor loadings of the original variables onto the principal components. The columns are the principal components, and the rows are the features loadings. For instance, the first column contains the loadings onto the first principal component. Note that the signs may be flipped depending on whether you use the 'eigen' or 'svd' solver; this does not affect the interpretation of the loadings though. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/feature_extraction/PrincipalComponentAnalysis/","title":"PrincipalComponentAnalysis"},{"location":"api_modules/mlxtend.feature_extraction/PrincipalComponentAnalysis/#methods","text":"fit(X) Learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns self : object get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values.' adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py","title":"Methods"},{"location":"api_modules/mlxtend.feature_extraction/PrincipalComponentAnalysis/#author-gael-varoquaux-amp103amp97amp101amp108amp46amp118amp97amp114amp111amp113amp117amp97amp117amp120amp64amp110amp111amp114amp109amp97amp108amp101amp115amp117amp112amp46amp111amp114amp103","text":"","title":"Author: Gael Varoquaux gael.varoquaux@normalesup.org"},{"location":"api_modules/mlxtend.feature_extraction/PrincipalComponentAnalysis/#license-bsd-3-clause","text":"set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py","title":"License: BSD 3 clause"},{"location":"api_modules/mlxtend.feature_extraction/PrincipalComponentAnalysis/#author-gael-varoquaux-amp103amp97amp101amp108amp46amp118amp97amp114amp111amp113amp117amp97amp117amp120amp64amp110amp111amp114amp109amp97amp108amp101amp115amp117amp112amp46amp111amp114amp103_1","text":"","title":"Author: Gael Varoquaux gael.varoquaux@normalesup.org"},{"location":"api_modules/mlxtend.feature_extraction/PrincipalComponentAnalysis/#license-bsd-3-clause_1","text":"transform(X) Apply the linear transformation on X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns X_projected : np.ndarray, shape = [n_samples, n_components] Projected training vectors.","title":"License: BSD 3 clause"},{"location":"api_modules/mlxtend.feature_extraction/RBFKernelPCA/","text":"RBFKernelPCA RBFKernelPCA(gamma=15.0, n_components=None, copy_X=True) RBF Kernel Principal Component Analysis for dimensionality reduction. Parameters gamma : float (default: 15.0) Free parameter (coefficient) of the RBF kernel. n_components : int (default: None) The number of principal components for transformation. Keeps the original dimensions of the dataset if None . copy_X : bool (default: True) Copies training data, which is required to compute the projection of new data via the transform method. Uses a reference to X if False. Attributes e_vals_ : array-like, shape=[n_features] Eigenvalues in sorted order. e_vecs_ : array-like, shape=[n_features] Eigenvectors in sorted order. X_projected_ : array-like, shape=[n_samples, n_components] Training samples projected along the component axes. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/feature_extraction/RBFKernelPCA/ Methods fit(X) Learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns self : object get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values.' adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py Author: Gael Varoquaux gael.varoquaux@normalesup.org License: BSD 3 clause set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py Author: Gael Varoquaux gael.varoquaux@normalesup.org License: BSD 3 clause transform(X) Apply the non-linear transformation on X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns X_projected : np.ndarray, shape = [n_samples, n_components] Projected training vectors.","title":"RBFKernelPCA"},{"location":"api_modules/mlxtend.feature_extraction/RBFKernelPCA/#rbfkernelpca","text":"RBFKernelPCA(gamma=15.0, n_components=None, copy_X=True) RBF Kernel Principal Component Analysis for dimensionality reduction. Parameters gamma : float (default: 15.0) Free parameter (coefficient) of the RBF kernel. n_components : int (default: None) The number of principal components for transformation. Keeps the original dimensions of the dataset if None . copy_X : bool (default: True) Copies training data, which is required to compute the projection of new data via the transform method. Uses a reference to X if False. Attributes e_vals_ : array-like, shape=[n_features] Eigenvalues in sorted order. e_vecs_ : array-like, shape=[n_features] Eigenvectors in sorted order. X_projected_ : array-like, shape=[n_samples, n_components] Training samples projected along the component axes. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/feature_extraction/RBFKernelPCA/","title":"RBFKernelPCA"},{"location":"api_modules/mlxtend.feature_extraction/RBFKernelPCA/#methods","text":"fit(X) Learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns self : object get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values.' adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py","title":"Methods"},{"location":"api_modules/mlxtend.feature_extraction/RBFKernelPCA/#author-gael-varoquaux-amp103amp97amp101amp108amp46amp118amp97amp114amp111amp113amp117amp97amp117amp120amp64amp110amp111amp114amp109amp97amp108amp101amp115amp117amp112amp46amp111amp114amp103","text":"","title":"Author: Gael Varoquaux gael.varoquaux@normalesup.org"},{"location":"api_modules/mlxtend.feature_extraction/RBFKernelPCA/#license-bsd-3-clause","text":"set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py","title":"License: BSD 3 clause"},{"location":"api_modules/mlxtend.feature_extraction/RBFKernelPCA/#author-gael-varoquaux-amp103amp97amp101amp108amp46amp118amp97amp114amp111amp113amp117amp97amp117amp120amp64amp110amp111amp114amp109amp97amp108amp101amp115amp117amp112amp46amp111amp114amp103_1","text":"","title":"Author: Gael Varoquaux gael.varoquaux@normalesup.org"},{"location":"api_modules/mlxtend.feature_extraction/RBFKernelPCA/#license-bsd-3-clause_1","text":"transform(X) Apply the non-linear transformation on X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns X_projected : np.ndarray, shape = [n_samples, n_components] Projected training vectors.","title":"License: BSD 3 clause"},{"location":"api_modules/mlxtend.feature_selection/ColumnSelector/","text":"ColumnSelector ColumnSelector(cols=None, drop_axis=False) Object for selecting specific columns from a data set. Parameters cols : array-like (default: None) A list specifying the feature indices to be selected. For example, [1, 4, 5] to select the 2nd, 5th, and 6th feature columns. If None, returns all columns in the array. drop_axis : bool (default=False) Drops last axis if True and the only one column is selected. This is useful, e.g., when the ColumnSelector is used for selecting only one column and the resulting array should be fed to e.g., a scikit-learn column selector. E.g., instead of returning an array with shape (n_samples, 1), drop_axis=True will return an aray with shape (n_samples,). Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/feature_selection/ColumnSelector/ Methods fit(X, y=None) Mock method. Does nothing. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] (default: None) Returns self fit_transform(X, y=None) Return a slice of the input array. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] (default: None) Returns X_slice : shape = [n_samples, k_features] Subset of the feature space where k_features <= n_features get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self transform(X, y=None) Return a slice of the input array. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] (default: None) Returns X_slice : shape = [n_samples, k_features] Subset of the feature space where k_features <= n_features","title":"ColumnSelector"},{"location":"api_modules/mlxtend.feature_selection/ColumnSelector/#columnselector","text":"ColumnSelector(cols=None, drop_axis=False) Object for selecting specific columns from a data set. Parameters cols : array-like (default: None) A list specifying the feature indices to be selected. For example, [1, 4, 5] to select the 2nd, 5th, and 6th feature columns. If None, returns all columns in the array. drop_axis : bool (default=False) Drops last axis if True and the only one column is selected. This is useful, e.g., when the ColumnSelector is used for selecting only one column and the resulting array should be fed to e.g., a scikit-learn column selector. E.g., instead of returning an array with shape (n_samples, 1), drop_axis=True will return an aray with shape (n_samples,). Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/feature_selection/ColumnSelector/","title":"ColumnSelector"},{"location":"api_modules/mlxtend.feature_selection/ColumnSelector/#methods","text":"fit(X, y=None) Mock method. Does nothing. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] (default: None) Returns self fit_transform(X, y=None) Return a slice of the input array. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] (default: None) Returns X_slice : shape = [n_samples, k_features] Subset of the feature space where k_features <= n_features get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self transform(X, y=None) Return a slice of the input array. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] (default: None) Returns X_slice : shape = [n_samples, k_features] Subset of the feature space where k_features <= n_features","title":"Methods"},{"location":"api_modules/mlxtend.feature_selection/ExhaustiveFeatureSelector/","text":"ExhaustiveFeatureSelector ExhaustiveFeatureSelector(estimator, min_features=1, max_features=1, print_progress=True, scoring='accuracy', cv=5, n_jobs=1, pre_dispatch='2 n_jobs', clone_estimator=True)* Exhaustive Feature Selection for Classification and Regression. (new in v0.4.3) Parameters estimator : scikit-learn classifier or regressor min_features : int (default: 1) Minumum number of features to select max_features : int (default: 1) Maximum number of features to select print_progress : bool (default: True) Prints progress as the number of epochs to stderr. scoring : str, (default='accuracy') Scoring metric in {accuracy, f1, precision, recall, roc_auc} for classifiers, {'mean_absolute_error', 'mean_squared_error', 'median_absolute_error', 'r2'} for regressors, or a callable object or function with signature scorer(estimator, X, y) . cv : int (default: 5) Scikit-learn cross-validation generator or int . If estimator is a classifier (or y consists of integer class labels), stratified k-fold is performed, and regular k-fold cross-validation otherwise. No cross-validation if cv is None, False, or 0. n_jobs : int (default: 1) The number of CPUs to use for evaluating different feature subsets in parallel. -1 means 'all CPUs'. pre_dispatch : int, or string (default: '2*n_jobs') Controls the number of jobs that get dispatched during parallel execution if n_jobs > 1 or n_jobs=-1 . Reducing this number can be useful to avoid an explosion of memory consumption when more jobs get dispatched than CPUs can process. This parameter can be: None, in which case all the jobs are immediately created and spawned. Use this for lightweight and fast-running jobs, to avoid delays due to on-demand spawning of the jobs An int, giving the exact number of total jobs that are spawned A string, giving an expression as a function of n_jobs, as in 2*n_jobs clone_estimator : bool (default: True) Clones estimator if True; works with the original estimator instance if False. Set to False if the estimator doesn't implement scikit-learn's set_params and get_params methods. In addition, it is required to set cv=0, and n_jobs=1. Attributes best_idx_ : array-like, shape = [n_predictions] Feature Indices of the selected feature subsets. best_feature_names_ : array-like, shape = [n_predictions] Feature names of the selected feature subsets. If pandas DataFrames are used in the fit method, the feature names correspond to the column names. Otherwise, the feature names are string representation of the feature array indices. New in v 0.13.0. best_score_ : float Cross validation average score of the selected subset. subsets_ : dict A dictionary of selected feature subsets during the exhaustive selection, where the dictionary keys are the lengths k of these feature subsets. The dictionary values are dictionaries themselves with the following keys: 'feature_idx' (tuple of indices of the feature subset) 'feature_names' (tuple of feature names of the feat. subset) 'cv_scores' (list individual cross-validation scores) 'avg_score' (average cross-validation score) Note that if pandas DataFrames are used in the fit method, the 'feature_names' correspond to the column names. Otherwise, the feature names are string representation of the feature array indices. The 'feature_names' is new in v 0.13.0. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/feature_selection/ExhaustiveFeatureSelector/ Methods fit(X, y, custom_feature_names=None, fit_params) Perform feature selection and learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. New in v 0.13.0: pandas DataFrames are now also accepted as argument for X. y : array-like, shape = [n_samples] Target values. custom_feature_names : None or tuple (default: tuple) Custom feature names for self.k_feature_names and self.subsets_[i]['feature_names'] . (new in v 0.13.0) fit_params : dict of string -> object, optional Parameters to pass to to the fit method of classifier. Returns self : object fit_transform(X, y, fit_params) Fit to training data and return the best selected features from X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. New in v 0.13.0: pandas DataFrames are now also accepted as argument for X. y : array-like, shape = [n_samples] Target values. fit_params : dict of string -> object, optional Parameters to pass to to the fit method of classifier. Returns Feature subset of X, shape={n_samples, k_features} get_metric_dict(confidence_interval=0.95) Return metric dictionary Parameters confidence_interval : float (default: 0.95) A positive float between 0.0 and 1.0 to compute the confidence interval bounds of the CV score averages. Returns Dictionary with items where each dictionary value is a list with the number of iterations (number of feature subsets) as its length. The dictionary keys corresponding to these lists are as follows: 'feature_idx': tuple of the indices of the feature subset 'cv_scores': list with individual CV scores 'avg_score': of CV average scores 'std_dev': standard deviation of the CV score average 'std_err': standard error of the CV score average 'ci_bound': confidence interval bound of the CV score average get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self transform(X) Return the best selected features from X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. New in v 0.13.0: pandas DataFrames are now also accepted as argument for X. Returns Feature subset of X, shape={n_samples, k_features}","title":"ExhaustiveFeatureSelector"},{"location":"api_modules/mlxtend.feature_selection/ExhaustiveFeatureSelector/#exhaustivefeatureselector","text":"ExhaustiveFeatureSelector(estimator, min_features=1, max_features=1, print_progress=True, scoring='accuracy', cv=5, n_jobs=1, pre_dispatch='2 n_jobs', clone_estimator=True)* Exhaustive Feature Selection for Classification and Regression. (new in v0.4.3) Parameters estimator : scikit-learn classifier or regressor min_features : int (default: 1) Minumum number of features to select max_features : int (default: 1) Maximum number of features to select print_progress : bool (default: True) Prints progress as the number of epochs to stderr. scoring : str, (default='accuracy') Scoring metric in {accuracy, f1, precision, recall, roc_auc} for classifiers, {'mean_absolute_error', 'mean_squared_error', 'median_absolute_error', 'r2'} for regressors, or a callable object or function with signature scorer(estimator, X, y) . cv : int (default: 5) Scikit-learn cross-validation generator or int . If estimator is a classifier (or y consists of integer class labels), stratified k-fold is performed, and regular k-fold cross-validation otherwise. No cross-validation if cv is None, False, or 0. n_jobs : int (default: 1) The number of CPUs to use for evaluating different feature subsets in parallel. -1 means 'all CPUs'. pre_dispatch : int, or string (default: '2*n_jobs') Controls the number of jobs that get dispatched during parallel execution if n_jobs > 1 or n_jobs=-1 . Reducing this number can be useful to avoid an explosion of memory consumption when more jobs get dispatched than CPUs can process. This parameter can be: None, in which case all the jobs are immediately created and spawned. Use this for lightweight and fast-running jobs, to avoid delays due to on-demand spawning of the jobs An int, giving the exact number of total jobs that are spawned A string, giving an expression as a function of n_jobs, as in 2*n_jobs clone_estimator : bool (default: True) Clones estimator if True; works with the original estimator instance if False. Set to False if the estimator doesn't implement scikit-learn's set_params and get_params methods. In addition, it is required to set cv=0, and n_jobs=1. Attributes best_idx_ : array-like, shape = [n_predictions] Feature Indices of the selected feature subsets. best_feature_names_ : array-like, shape = [n_predictions] Feature names of the selected feature subsets. If pandas DataFrames are used in the fit method, the feature names correspond to the column names. Otherwise, the feature names are string representation of the feature array indices. New in v 0.13.0. best_score_ : float Cross validation average score of the selected subset. subsets_ : dict A dictionary of selected feature subsets during the exhaustive selection, where the dictionary keys are the lengths k of these feature subsets. The dictionary values are dictionaries themselves with the following keys: 'feature_idx' (tuple of indices of the feature subset) 'feature_names' (tuple of feature names of the feat. subset) 'cv_scores' (list individual cross-validation scores) 'avg_score' (average cross-validation score) Note that if pandas DataFrames are used in the fit method, the 'feature_names' correspond to the column names. Otherwise, the feature names are string representation of the feature array indices. The 'feature_names' is new in v 0.13.0. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/feature_selection/ExhaustiveFeatureSelector/","title":"ExhaustiveFeatureSelector"},{"location":"api_modules/mlxtend.feature_selection/ExhaustiveFeatureSelector/#methods","text":"fit(X, y, custom_feature_names=None, fit_params) Perform feature selection and learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. New in v 0.13.0: pandas DataFrames are now also accepted as argument for X. y : array-like, shape = [n_samples] Target values. custom_feature_names : None or tuple (default: tuple) Custom feature names for self.k_feature_names and self.subsets_[i]['feature_names'] . (new in v 0.13.0) fit_params : dict of string -> object, optional Parameters to pass to to the fit method of classifier. Returns self : object fit_transform(X, y, fit_params) Fit to training data and return the best selected features from X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. New in v 0.13.0: pandas DataFrames are now also accepted as argument for X. y : array-like, shape = [n_samples] Target values. fit_params : dict of string -> object, optional Parameters to pass to to the fit method of classifier. Returns Feature subset of X, shape={n_samples, k_features} get_metric_dict(confidence_interval=0.95) Return metric dictionary Parameters confidence_interval : float (default: 0.95) A positive float between 0.0 and 1.0 to compute the confidence interval bounds of the CV score averages. Returns Dictionary with items where each dictionary value is a list with the number of iterations (number of feature subsets) as its length. The dictionary keys corresponding to these lists are as follows: 'feature_idx': tuple of the indices of the feature subset 'cv_scores': list with individual CV scores 'avg_score': of CV average scores 'std_dev': standard deviation of the CV score average 'std_err': standard error of the CV score average 'ci_bound': confidence interval bound of the CV score average get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self transform(X) Return the best selected features from X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. New in v 0.13.0: pandas DataFrames are now also accepted as argument for X. Returns Feature subset of X, shape={n_samples, k_features}","title":"Methods"},{"location":"api_modules/mlxtend.feature_selection/SequentialFeatureSelector/","text":"SequentialFeatureSelector SequentialFeatureSelector(estimator, k_features=1, forward=True, floating=False, verbose=0, scoring=None, cv=5, n_jobs=1, pre_dispatch='2 n_jobs', clone_estimator=True)* Sequential Feature Selection for Classification and Regression. Parameters estimator : scikit-learn classifier or regressor k_features : int or tuple or str (default: 1) Number of features to select, where k_features < the full feature set. New in 0.4.2: A tuple containing a min and max value can be provided, and the SFS will consider return any feature combination between min and max that scored highest in cross-validtion. For example, the tuple (1, 4) will return any combination from 1 up to 4 features instead of a fixed number of features k. New in 0.8.0: A string argument \"best\" or \"parsimonious\". If \"best\" is provided, the feature selector will return the feature subset with the best cross-validation performance. If \"parsimonious\" is provided as an argument, the smallest feature subset that is within one standard error of the cross-validation performance will be selected. forward : bool (default: True) Forward selection if True, backward selection otherwise floating : bool (default: False) Adds a conditional exclusion/inclusion if True. verbose : int (default: 0), level of verbosity to use in logging. If 0, no output, if 1 number of features in current set, if 2 detailed logging i ncluding timestamp and cv scores at step. scoring : str, callable, or None (default: None) If None (default), uses 'accuracy' for sklearn classifiers and 'r2' for sklearn regressors. If str, uses a sklearn scoring metric string identifier, for example {accuracy, f1, precision, recall, roc_auc} for classifiers, {'mean_absolute_error', 'mean_squared_error'/'neg_mean_squared_error', 'median_absolute_error', 'r2'} for regressors. If a callable object or function is provided, it has to be conform with sklearn's signature scorer(estimator, X, y) ; see http://scikit-learn.org/stable/modules/generated/sklearn.metrics.make_scorer.html for more information. cv : int (default: 5) Integer or iterable yielding train, test splits. If cv is an integer and estimator is a classifier (or y consists of integer class labels) stratified k-fold. Otherwise regular k-fold cross-validation is performed. No cross-validation if cv is None, False, or 0. n_jobs : int (default: 1) The number of CPUs to use for evaluating different feature subsets in parallel. -1 means 'all CPUs'. pre_dispatch : int, or string (default: '2*n_jobs') Controls the number of jobs that get dispatched during parallel execution if n_jobs > 1 or n_jobs=-1 . Reducing this number can be useful to avoid an explosion of memory consumption when more jobs get dispatched than CPUs can process. This parameter can be: None, in which case all the jobs are immediately created and spawned. Use this for lightweight and fast-running jobs, to avoid delays due to on-demand spawning of the jobs An int, giving the exact number of total jobs that are spawned A string, giving an expression as a function of n_jobs, as in 2*n_jobs clone_estimator : bool (default: True) Clones estimator if True; works with the original estimator instance if False. Set to False if the estimator doesn't implement scikit-learn's set_params and get_params methods. In addition, it is required to set cv=0, and n_jobs=1. Attributes k_feature_idx_ : array-like, shape = [n_predictions] Feature Indices of the selected feature subsets. k_feature_names_ : array-like, shape = [n_predictions] Feature names of the selected feature subsets. If pandas DataFrames are used in the fit method, the feature names correspond to the column names. Otherwise, the feature names are string representation of the feature array indices. New in v 0.13.0. k_score_ : float Cross validation average score of the selected subset. subsets_ : dict A dictionary of selected feature subsets during the sequential selection, where the dictionary keys are the lengths k of these feature subsets. The dictionary values are dictionaries themselves with the following keys: 'feature_idx' (tuple of indices of the feature subset) 'feature_names' (tuple of feature names of the feat. subset) 'cv_scores' (list individual cross-validation scores) 'avg_score' (average cross-validation score) Note that if pandas DataFrames are used in the fit method, the 'feature_names' correspond to the column names. Otherwise, the feature names are string representation of the feature array indices. The 'feature_names' is new in v 0.13.0. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/feature_selection/SequentialFeatureSelector/ Methods fit(X, y, custom_feature_names=None, fit_params) Perform feature selection and learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. New in v 0.13.0: pandas DataFrames are now also accepted as argument for X. y : array-like, shape = [n_samples] Target values. New in v 0.13.0: pandas DataFrames are now also accepted as argument for y. custom_feature_names : None or tuple (default: tuple) Custom feature names for self.k_feature_names and self.subsets_[i]['feature_names'] . (new in v 0.13.0) fit_params : dict of string -> object, optional Parameters to pass to to the fit method of classifier. Returns self : object fit_transform(X, y, fit_params) Fit to training data then reduce X to its most important features. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. New in v 0.13.0: pandas DataFrames are now also accepted as argument for X. y : array-like, shape = [n_samples] Target values. New in v 0.13.0: a pandas Series are now also accepted as argument for y. fit_params : dict of string -> object, optional Parameters to pass to to the fit method of classifier. Returns Reduced feature subset of X, shape={n_samples, k_features} get_metric_dict(confidence_interval=0.95) Return metric dictionary Parameters confidence_interval : float (default: 0.95) A positive float between 0.0 and 1.0 to compute the confidence interval bounds of the CV score averages. Returns Dictionary with items where each dictionary value is a list with the number of iterations (number of feature subsets) as its length. The dictionary keys corresponding to these lists are as follows: 'feature_idx': tuple of the indices of the feature subset 'cv_scores': list with individual CV scores 'avg_score': of CV average scores 'std_dev': standard deviation of the CV score average 'std_err': standard error of the CV score average 'ci_bound': confidence interval bound of the CV score average get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self transform(X) Reduce X to its most important features. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. New in v 0.13.0: pandas DataFrames are now also accepted as argument for X. Returns Reduced feature subset of X, shape={n_samples, k_features}","title":"SequentialFeatureSelector"},{"location":"api_modules/mlxtend.feature_selection/SequentialFeatureSelector/#sequentialfeatureselector","text":"SequentialFeatureSelector(estimator, k_features=1, forward=True, floating=False, verbose=0, scoring=None, cv=5, n_jobs=1, pre_dispatch='2 n_jobs', clone_estimator=True)* Sequential Feature Selection for Classification and Regression. Parameters estimator : scikit-learn classifier or regressor k_features : int or tuple or str (default: 1) Number of features to select, where k_features < the full feature set. New in 0.4.2: A tuple containing a min and max value can be provided, and the SFS will consider return any feature combination between min and max that scored highest in cross-validtion. For example, the tuple (1, 4) will return any combination from 1 up to 4 features instead of a fixed number of features k. New in 0.8.0: A string argument \"best\" or \"parsimonious\". If \"best\" is provided, the feature selector will return the feature subset with the best cross-validation performance. If \"parsimonious\" is provided as an argument, the smallest feature subset that is within one standard error of the cross-validation performance will be selected. forward : bool (default: True) Forward selection if True, backward selection otherwise floating : bool (default: False) Adds a conditional exclusion/inclusion if True. verbose : int (default: 0), level of verbosity to use in logging. If 0, no output, if 1 number of features in current set, if 2 detailed logging i ncluding timestamp and cv scores at step. scoring : str, callable, or None (default: None) If None (default), uses 'accuracy' for sklearn classifiers and 'r2' for sklearn regressors. If str, uses a sklearn scoring metric string identifier, for example {accuracy, f1, precision, recall, roc_auc} for classifiers, {'mean_absolute_error', 'mean_squared_error'/'neg_mean_squared_error', 'median_absolute_error', 'r2'} for regressors. If a callable object or function is provided, it has to be conform with sklearn's signature scorer(estimator, X, y) ; see http://scikit-learn.org/stable/modules/generated/sklearn.metrics.make_scorer.html for more information. cv : int (default: 5) Integer or iterable yielding train, test splits. If cv is an integer and estimator is a classifier (or y consists of integer class labels) stratified k-fold. Otherwise regular k-fold cross-validation is performed. No cross-validation if cv is None, False, or 0. n_jobs : int (default: 1) The number of CPUs to use for evaluating different feature subsets in parallel. -1 means 'all CPUs'. pre_dispatch : int, or string (default: '2*n_jobs') Controls the number of jobs that get dispatched during parallel execution if n_jobs > 1 or n_jobs=-1 . Reducing this number can be useful to avoid an explosion of memory consumption when more jobs get dispatched than CPUs can process. This parameter can be: None, in which case all the jobs are immediately created and spawned. Use this for lightweight and fast-running jobs, to avoid delays due to on-demand spawning of the jobs An int, giving the exact number of total jobs that are spawned A string, giving an expression as a function of n_jobs, as in 2*n_jobs clone_estimator : bool (default: True) Clones estimator if True; works with the original estimator instance if False. Set to False if the estimator doesn't implement scikit-learn's set_params and get_params methods. In addition, it is required to set cv=0, and n_jobs=1. Attributes k_feature_idx_ : array-like, shape = [n_predictions] Feature Indices of the selected feature subsets. k_feature_names_ : array-like, shape = [n_predictions] Feature names of the selected feature subsets. If pandas DataFrames are used in the fit method, the feature names correspond to the column names. Otherwise, the feature names are string representation of the feature array indices. New in v 0.13.0. k_score_ : float Cross validation average score of the selected subset. subsets_ : dict A dictionary of selected feature subsets during the sequential selection, where the dictionary keys are the lengths k of these feature subsets. The dictionary values are dictionaries themselves with the following keys: 'feature_idx' (tuple of indices of the feature subset) 'feature_names' (tuple of feature names of the feat. subset) 'cv_scores' (list individual cross-validation scores) 'avg_score' (average cross-validation score) Note that if pandas DataFrames are used in the fit method, the 'feature_names' correspond to the column names. Otherwise, the feature names are string representation of the feature array indices. The 'feature_names' is new in v 0.13.0. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/feature_selection/SequentialFeatureSelector/","title":"SequentialFeatureSelector"},{"location":"api_modules/mlxtend.feature_selection/SequentialFeatureSelector/#methods","text":"fit(X, y, custom_feature_names=None, fit_params) Perform feature selection and learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. New in v 0.13.0: pandas DataFrames are now also accepted as argument for X. y : array-like, shape = [n_samples] Target values. New in v 0.13.0: pandas DataFrames are now also accepted as argument for y. custom_feature_names : None or tuple (default: tuple) Custom feature names for self.k_feature_names and self.subsets_[i]['feature_names'] . (new in v 0.13.0) fit_params : dict of string -> object, optional Parameters to pass to to the fit method of classifier. Returns self : object fit_transform(X, y, fit_params) Fit to training data then reduce X to its most important features. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. New in v 0.13.0: pandas DataFrames are now also accepted as argument for X. y : array-like, shape = [n_samples] Target values. New in v 0.13.0: a pandas Series are now also accepted as argument for y. fit_params : dict of string -> object, optional Parameters to pass to to the fit method of classifier. Returns Reduced feature subset of X, shape={n_samples, k_features} get_metric_dict(confidence_interval=0.95) Return metric dictionary Parameters confidence_interval : float (default: 0.95) A positive float between 0.0 and 1.0 to compute the confidence interval bounds of the CV score averages. Returns Dictionary with items where each dictionary value is a list with the number of iterations (number of feature subsets) as its length. The dictionary keys corresponding to these lists are as follows: 'feature_idx': tuple of the indices of the feature subset 'cv_scores': list with individual CV scores 'avg_score': of CV average scores 'std_dev': standard deviation of the CV score average 'std_err': standard error of the CV score average 'ci_bound': confidence interval bound of the CV score average get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self transform(X) Reduce X to its most important features. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. New in v 0.13.0: pandas DataFrames are now also accepted as argument for X. Returns Reduced feature subset of X, shape={n_samples, k_features}","title":"Methods"},{"location":"api_modules/mlxtend.file_io/find_filegroups/","text":"find_filegroups find_filegroups(paths, substring='', extensions=None, validity_check=True, ignore_invisible=True, rstrip='', ignore_substring=None) Find and collect files from different directories in a python dictionary. Parameters paths : list Paths of the directories to be searched. Dictionary keys are build from the first directory. substring : str (default: '') Substring that all files have to contain to be considered. extensions : list (default: None) None or list of allowed file extensions for each path. If provided, the number of extensions must match the number of paths . validity_check : bool (default: None) If True , checks if all dictionary values have the same number of file paths. Prints a warning and returns an empty dictionary if the validity check failed. ignore_invisible : bool (default: True) If True , ignores invisible files (i.e., files starting with a period). rstrip : str (default: '') If provided, strips characters from right side of the file base names after splitting the extension. Useful to trim different filenames to a common stem. E.g,. \"abc_d.txt\" and \"abc_d_.csv\" would share the stem \"abc_d\" if rstrip is set to \"_\". ignore_substring : str (default: None) Ignores files that contain the specified substring. Returns groups : dict Dictionary of files paths. Keys are the file names found in the first directory listed in paths (without file extension). Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/file_io/find_filegroups/","title":"Find filegroups"},{"location":"api_modules/mlxtend.file_io/find_filegroups/#find_filegroups","text":"find_filegroups(paths, substring='', extensions=None, validity_check=True, ignore_invisible=True, rstrip='', ignore_substring=None) Find and collect files from different directories in a python dictionary. Parameters paths : list Paths of the directories to be searched. Dictionary keys are build from the first directory. substring : str (default: '') Substring that all files have to contain to be considered. extensions : list (default: None) None or list of allowed file extensions for each path. If provided, the number of extensions must match the number of paths . validity_check : bool (default: None) If True , checks if all dictionary values have the same number of file paths. Prints a warning and returns an empty dictionary if the validity check failed. ignore_invisible : bool (default: True) If True , ignores invisible files (i.e., files starting with a period). rstrip : str (default: '') If provided, strips characters from right side of the file base names after splitting the extension. Useful to trim different filenames to a common stem. E.g,. \"abc_d.txt\" and \"abc_d_.csv\" would share the stem \"abc_d\" if rstrip is set to \"_\". ignore_substring : str (default: None) Ignores files that contain the specified substring. Returns groups : dict Dictionary of files paths. Keys are the file names found in the first directory listed in paths (without file extension). Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/file_io/find_filegroups/","title":"find_filegroups"},{"location":"api_modules/mlxtend.file_io/find_files/","text":"find_files find_files(substring, path, recursive=False, check_ext=None, ignore_invisible=True, ignore_substring=None) Find files in a directory based on substring matching. Parameters substring : str Substring of the file to be matched. path : str Path where to look. recursive : bool If true, searches subdirectories recursively. check_ext : str If string (e.g., '.txt'), only returns files that match the specified file extension. ignore_invisible : bool If True , ignores invisible files (i.e., files starting with a period). ignore_substring : str Ignores files that contain the specified substring. Returns results : list List of the matched files. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/file_io/find_files/","title":"Find files"},{"location":"api_modules/mlxtend.file_io/find_files/#find_files","text":"find_files(substring, path, recursive=False, check_ext=None, ignore_invisible=True, ignore_substring=None) Find files in a directory based on substring matching. Parameters substring : str Substring of the file to be matched. path : str Path where to look. recursive : bool If true, searches subdirectories recursively. check_ext : str If string (e.g., '.txt'), only returns files that match the specified file extension. ignore_invisible : bool If True , ignores invisible files (i.e., files starting with a period). ignore_substring : str Ignores files that contain the specified substring. Returns results : list List of the matched files. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/file_io/find_files/","title":"find_files"},{"location":"api_modules/mlxtend.frequent_patterns/apriori/","text":"apriori apriori(df, min_support=0.5, use_colnames=False, max_len=None, n_jobs=1) Get frequent itemsets from a one-hot DataFrame Parameters df : pandas DataFrame or pandas SparseDataFrame pandas DataFrame the encoded format. The allowed values are either 0/1 or True/False. For example, Apple Bananas Beer Chicken Milk Rice 0 1 0 1 1 0 1 1 1 0 1 0 0 1 2 1 0 1 0 0 0 3 1 1 0 0 0 0 4 0 0 1 1 1 1 5 0 0 1 0 1 1 6 0 0 1 0 1 0 7 1 1 0 0 0 0 min_support : float (default: 0.5) A float between 0 and 1 for minumum support of the itemsets returned. The support is computed as the fraction transactions_where_item(s)_occur / total_transactions. use_colnames : bool (default: False) If true, uses the DataFrames' column names in the returned DataFrame instead of column indices. max_len : int (default: None) Maximum length of the itemsets generated. If None (default) all possible itemsets lengths (under the apriori condition) are evaluated. Returns pandas DataFrame with columns ['support', 'itemsets'] of all itemsets that are >= min_support and < than max_len (if max_len is not None). Each itemset in the 'itemsets' column is of type frozenset , which is a Python built-in type that behaves similarly to sets except that it is immutable (For more info, see https://docs.python.org/3.6/library/stdtypes.html#frozenset). Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/frequent_patterns/apriori/","title":"Apriori"},{"location":"api_modules/mlxtend.frequent_patterns/apriori/#apriori","text":"apriori(df, min_support=0.5, use_colnames=False, max_len=None, n_jobs=1) Get frequent itemsets from a one-hot DataFrame Parameters df : pandas DataFrame or pandas SparseDataFrame pandas DataFrame the encoded format. The allowed values are either 0/1 or True/False. For example, Apple Bananas Beer Chicken Milk Rice 0 1 0 1 1 0 1 1 1 0 1 0 0 1 2 1 0 1 0 0 0 3 1 1 0 0 0 0 4 0 0 1 1 1 1 5 0 0 1 0 1 1 6 0 0 1 0 1 0 7 1 1 0 0 0 0 min_support : float (default: 0.5) A float between 0 and 1 for minumum support of the itemsets returned. The support is computed as the fraction transactions_where_item(s)_occur / total_transactions. use_colnames : bool (default: False) If true, uses the DataFrames' column names in the returned DataFrame instead of column indices. max_len : int (default: None) Maximum length of the itemsets generated. If None (default) all possible itemsets lengths (under the apriori condition) are evaluated. Returns pandas DataFrame with columns ['support', 'itemsets'] of all itemsets that are >= min_support and < than max_len (if max_len is not None). Each itemset in the 'itemsets' column is of type frozenset , which is a Python built-in type that behaves similarly to sets except that it is immutable (For more info, see https://docs.python.org/3.6/library/stdtypes.html#frozenset). Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/frequent_patterns/apriori/","title":"apriori"},{"location":"api_modules/mlxtend.frequent_patterns/association_rules/","text":"association_rules association_rules(df, metric='confidence', min_threshold=0.8, support_only=False) Generates a DataFrame of association rules including the metrics 'score', 'confidence', and 'lift' Parameters df : pandas DataFrame pandas DataFrame of frequent itemsets with columns ['support', 'itemsets'] metric : string (default: 'confidence') Metric to evaluate if a rule is of interest. Automatically set to 'support' if support_only=True . Otherwise, supported metrics are 'support', 'confidence', 'lift', 'leverage', and 'conviction' These metrics are computed as follows: - support(A->C) = support(A+C) [aka 'support'], range: [0, 1] - confidence(A->C) = support(A+C) / support(A), range: [0, 1] - lift(A->C) = confidence(A->C) / support(C), range: [0, inf] - leverage(A->C) = support(A->C) - support(A)*support(C), range: [-1, 1] - conviction = [1 - support(C)] / [1 - confidence(A->C)], range: [0, inf] min_threshold : float (default: 0.8) Minimal threshold for the evaluation metric, via the metric parameter, to decide whether a candidate rule is of interest. support_only : bool (default: False) Only computes the rule support and fills the other metric columns with NaNs. This is useful if: a) the input DataFrame is incomplete, e.g., does not contain support values for all rule antecedents and consequents b) you simply want to speed up the computation because you don't need the other metrics. Returns pandas DataFrame with columns \"antecedents\" and \"consequents\" that store itemsets, plus the scoring metric columns: \"antecedent support\", \"consequent support\", \"support\", \"confidence\", \"lift\", \"leverage\", \"conviction\" of all rules for which metric(rule) >= min_threshold. Each entry in the \"antecedents\" and \"consequents\" columns are of type frozenset , which is a Python built-in type that behaves similarly to sets except that it is immutable (For more info, see https://docs.python.org/3.6/library/stdtypes.html#frozenset). Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/frequent_patterns/association_rules/","title":"Association rules"},{"location":"api_modules/mlxtend.frequent_patterns/association_rules/#association_rules","text":"association_rules(df, metric='confidence', min_threshold=0.8, support_only=False) Generates a DataFrame of association rules including the metrics 'score', 'confidence', and 'lift' Parameters df : pandas DataFrame pandas DataFrame of frequent itemsets with columns ['support', 'itemsets'] metric : string (default: 'confidence') Metric to evaluate if a rule is of interest. Automatically set to 'support' if support_only=True . Otherwise, supported metrics are 'support', 'confidence', 'lift', 'leverage', and 'conviction' These metrics are computed as follows: - support(A->C) = support(A+C) [aka 'support'], range: [0, 1] - confidence(A->C) = support(A+C) / support(A), range: [0, 1] - lift(A->C) = confidence(A->C) / support(C), range: [0, inf] - leverage(A->C) = support(A->C) - support(A)*support(C), range: [-1, 1] - conviction = [1 - support(C)] / [1 - confidence(A->C)], range: [0, inf] min_threshold : float (default: 0.8) Minimal threshold for the evaluation metric, via the metric parameter, to decide whether a candidate rule is of interest. support_only : bool (default: False) Only computes the rule support and fills the other metric columns with NaNs. This is useful if: a) the input DataFrame is incomplete, e.g., does not contain support values for all rule antecedents and consequents b) you simply want to speed up the computation because you don't need the other metrics. Returns pandas DataFrame with columns \"antecedents\" and \"consequents\" that store itemsets, plus the scoring metric columns: \"antecedent support\", \"consequent support\", \"support\", \"confidence\", \"lift\", \"leverage\", \"conviction\" of all rules for which metric(rule) >= min_threshold. Each entry in the \"antecedents\" and \"consequents\" columns are of type frozenset , which is a Python built-in type that behaves similarly to sets except that it is immutable (For more info, see https://docs.python.org/3.6/library/stdtypes.html#frozenset). Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/frequent_patterns/association_rules/","title":"association_rules"},{"location":"api_modules/mlxtend.image/extract_face_landmarks/","text":"extract_face_landmarks extract_face_landmarks(img, return_dtype= ) Function to extract face landmarks. Note that this function requires an installation of the Python version of the library \"dlib\": http://dlib.net Parameters img : array, shape = [h, w, ?] numpy array of a face image. Supported shapes are - 3D tensors with 1 or more color channels, for example, RGB: [h, w, 3] - 2D tensors without color channel, for example, Grayscale: [h, w] return_dtype: the return data-type of the array, default: np.int32. Returns landmarks : numpy.ndarray, shape = [68, 2] A numpy array, where each row contains a landmark/point x-y coordinates. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/sources/image/extract_face_landmarks.ipynb","title":"Extract face landmarks"},{"location":"api_modules/mlxtend.image/extract_face_landmarks/#extract_face_landmarks","text":"extract_face_landmarks(img, return_dtype= ) Function to extract face landmarks. Note that this function requires an installation of the Python version of the library \"dlib\": http://dlib.net Parameters img : array, shape = [h, w, ?] numpy array of a face image. Supported shapes are - 3D tensors with 1 or more color channels, for example, RGB: [h, w, 3] - 2D tensors without color channel, for example, Grayscale: [h, w] return_dtype: the return data-type of the array, default: np.int32. Returns landmarks : numpy.ndarray, shape = [68, 2] A numpy array, where each row contains a landmark/point x-y coordinates. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/sources/image/extract_face_landmarks.ipynb","title":"extract_face_landmarks"},{"location":"api_modules/mlxtend.math/factorial/","text":"factorial factorial(n) None","title":"Factorial"},{"location":"api_modules/mlxtend.math/factorial/#factorial","text":"factorial(n) None","title":"factorial"},{"location":"api_modules/mlxtend.math/num_combinations/","text":"num_combinations num_combinations(n, k, with_replacement=False) Function to calculate the number of possible combinations. Parameters n : int Total number of items. k : int Number of elements of the target itemset. with_replacement : bool (default: False) Allows repeated elements if True. Returns comb : int Number of possible combinations. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/math/num_combinations/","title":"Num combinations"},{"location":"api_modules/mlxtend.math/num_combinations/#num_combinations","text":"num_combinations(n, k, with_replacement=False) Function to calculate the number of possible combinations. Parameters n : int Total number of items. k : int Number of elements of the target itemset. with_replacement : bool (default: False) Allows repeated elements if True. Returns comb : int Number of possible combinations. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/math/num_combinations/","title":"num_combinations"},{"location":"api_modules/mlxtend.math/num_permutations/","text":"num_permutations num_permutations(n, k, with_replacement=False) Function to calculate the number of possible permutations. Parameters n : int Total number of items. k : int Number of elements of the target itemset. with_replacement : bool Allows repeated elements if True. Returns permut : int Number of possible permutations. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/math/num_permutations/","title":"Num permutations"},{"location":"api_modules/mlxtend.math/num_permutations/#num_permutations","text":"num_permutations(n, k, with_replacement=False) Function to calculate the number of possible permutations. Parameters n : int Total number of items. k : int Number of elements of the target itemset. with_replacement : bool Allows repeated elements if True. Returns permut : int Number of possible permutations. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/math/num_permutations/","title":"num_permutations"},{"location":"api_modules/mlxtend.math/vectorspace_dimensionality/","text":"vectorspace_dimensionality vectorspace_dimensionality(ary) Computes the hyper-volume spanned by a vector set Parameters ary : array-like, shape=[num_vectors, num_vectors] An orthogonal set of vectors (arranged as columns in a matrix) Returns dimensions : int An integer indicating the \"dimensionality\" hyper-volume spanned by the vector set","title":"Vectorspace dimensionality"},{"location":"api_modules/mlxtend.math/vectorspace_dimensionality/#vectorspace_dimensionality","text":"vectorspace_dimensionality(ary) Computes the hyper-volume spanned by a vector set Parameters ary : array-like, shape=[num_vectors, num_vectors] An orthogonal set of vectors (arranged as columns in a matrix) Returns dimensions : int An integer indicating the \"dimensionality\" hyper-volume spanned by the vector set","title":"vectorspace_dimensionality"},{"location":"api_modules/mlxtend.math/vectorspace_orthonormalization/","text":"vectorspace_orthonormalization vectorspace_orthonormalization(ary, eps=1e-13) Transforms a set of column vectors to a orthonormal basis. Given a set of orthogonal vectors, this functions converts such column vectors, arranged in a matrix, into orthonormal basis vectors. Parameters ary : array-like, shape=[num_vectors, num_vectors] An orthogonal set of vectors (arranged as columns in a matrix) eps : float (default: 1e-13) A small tolerance value to determine whether the vector norm is zero or not. Returns arr : array-like, shape=[num_vectors, num_vectors] An orthonormal set of vectors (arranged as columns)","title":"Vectorspace orthonormalization"},{"location":"api_modules/mlxtend.math/vectorspace_orthonormalization/#vectorspace_orthonormalization","text":"vectorspace_orthonormalization(ary, eps=1e-13) Transforms a set of column vectors to a orthonormal basis. Given a set of orthogonal vectors, this functions converts such column vectors, arranged in a matrix, into orthonormal basis vectors. Parameters ary : array-like, shape=[num_vectors, num_vectors] An orthogonal set of vectors (arranged as columns in a matrix) eps : float (default: 1e-13) A small tolerance value to determine whether the vector norm is zero or not. Returns arr : array-like, shape=[num_vectors, num_vectors] An orthonormal set of vectors (arranged as columns)","title":"vectorspace_orthonormalization"},{"location":"api_modules/mlxtend.plotting/category_scatter/","text":"category_scatter category_scatter(x, y, label_col, data, markers='sxo^v', colors=('blue', 'green', 'red', 'purple', 'gray', 'cyan'), alpha=0.7, markersize=20.0, legend_loc='best') Scatter plot to plot categories in different colors/markerstyles. Parameters x : str or int DataFrame column name of the x-axis values or integer for the numpy ndarray column index. y : str DataFrame column name of the y-axis values or integer for the numpy ndarray column index data : Pandas DataFrame object or NumPy ndarray. markers : str Markers that are cycled through the label category. colors : tuple Colors that are cycled through the label category. alpha : float (default: 0.7) Parameter to control the transparency. markersize : float (default` : 20.0) Parameter to control the marker size. legend_loc : str (default: 'best') Location of the plot legend {best, upper left, upper right, lower left, lower right} No legend if legend_loc=False Returns fig : matplotlig.pyplot figure object Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/plotting/category_scatter/","title":"Category scatter"},{"location":"api_modules/mlxtend.plotting/category_scatter/#category_scatter","text":"category_scatter(x, y, label_col, data, markers='sxo^v', colors=('blue', 'green', 'red', 'purple', 'gray', 'cyan'), alpha=0.7, markersize=20.0, legend_loc='best') Scatter plot to plot categories in different colors/markerstyles. Parameters x : str or int DataFrame column name of the x-axis values or integer for the numpy ndarray column index. y : str DataFrame column name of the y-axis values or integer for the numpy ndarray column index data : Pandas DataFrame object or NumPy ndarray. markers : str Markers that are cycled through the label category. colors : tuple Colors that are cycled through the label category. alpha : float (default: 0.7) Parameter to control the transparency. markersize : float (default` : 20.0) Parameter to control the marker size. legend_loc : str (default: 'best') Location of the plot legend {best, upper left, upper right, lower left, lower right} No legend if legend_loc=False Returns fig : matplotlig.pyplot figure object Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/plotting/category_scatter/","title":"category_scatter"},{"location":"api_modules/mlxtend.plotting/checkerboard_plot/","text":"checkerboard_plot checkerboard_plot(ary, cell_colors=('white', 'black'), font_colors=('black', 'white'), fmt='%.1f', figsize=None, row_labels=None, col_labels=None, fontsize=None) Plot a checkerboard table / heatmap via matplotlib. Parameters ary : array-like, shape = [n, m] A 2D Nnumpy array. cell_colors : tuple or list (default: ('white', 'black')) Tuple or list containing the two colors of the checkerboard pattern. font_colors : tuple or list (default: ('black', 'white')) Font colors corresponding to the cell colors. figsize : tuple (default: (2.5, 2.5)) Height and width of the figure fmt : str (default: '%.1f') Python string formatter for cell values. The default '%.1f' results in floats with 1 digit after the decimal point. Use '%d' to show numbers as integers. row_labels : list (default: None) List of the row labels. Uses the array row indices 0 to n by default. col_labels : list (default: None) List of the column labels. Uses the array column indices 0 to m by default. fontsize : int (default: None) Specifies the font size of the checkerboard table. Uses matplotlib's default if None. Returns fig : matplotlib Figure object. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/plotting/checkerboard_plot/","title":"Checkerboard plot"},{"location":"api_modules/mlxtend.plotting/checkerboard_plot/#checkerboard_plot","text":"checkerboard_plot(ary, cell_colors=('white', 'black'), font_colors=('black', 'white'), fmt='%.1f', figsize=None, row_labels=None, col_labels=None, fontsize=None) Plot a checkerboard table / heatmap via matplotlib. Parameters ary : array-like, shape = [n, m] A 2D Nnumpy array. cell_colors : tuple or list (default: ('white', 'black')) Tuple or list containing the two colors of the checkerboard pattern. font_colors : tuple or list (default: ('black', 'white')) Font colors corresponding to the cell colors. figsize : tuple (default: (2.5, 2.5)) Height and width of the figure fmt : str (default: '%.1f') Python string formatter for cell values. The default '%.1f' results in floats with 1 digit after the decimal point. Use '%d' to show numbers as integers. row_labels : list (default: None) List of the row labels. Uses the array row indices 0 to n by default. col_labels : list (default: None) List of the column labels. Uses the array column indices 0 to m by default. fontsize : int (default: None) Specifies the font size of the checkerboard table. Uses matplotlib's default if None. Returns fig : matplotlib Figure object. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/plotting/checkerboard_plot/","title":"checkerboard_plot"},{"location":"api_modules/mlxtend.plotting/ecdf/","text":"ecdf ecdf(x, y_label='ECDF', x_label=None, ax=None, percentile=None, ecdf_color=None, ecdf_marker='o', percentile_color='black', percentile_linestyle='--') Plots an Empirical Cumulative Distribution Function Parameters x : array or list, shape=[n_samples,] Array-like object containing the feature values y_label : str (default='ECDF') Text label for the y-axis x_label : str (default=None) Text label for the x-axis ax : matplotlib.axes.Axes (default: None) An existing matplotlib Axes. Creates one if ax=None percentile : float (default=None) Float between 0 and 1 for plotting a percentile threshold line ecdf_color : matplotlib color (default=None) Color for the ECDF plot; uses matplotlib defaults if None ecdf_marker : matplotlib marker (default='o') Marker style for the ECDF plot percentile_color : matplotlib color (default='black') Color for the percentile threshold if percentile is not None percentile_linestyle : matplotlib linestyle (default='--') Line style for the percentile threshold if percentile is not None Returns ax : matplotlib.axes.Axes object percentile_threshold : float Feature threshold at the percentile or None if percentile=None percentile_count : Number of if percentile is not None Number of samples that have a feature less or equal than the feature threshold at a percentile threshold or None if percentile=None Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/plotting/ecdf/","title":"Ecdf"},{"location":"api_modules/mlxtend.plotting/ecdf/#ecdf","text":"ecdf(x, y_label='ECDF', x_label=None, ax=None, percentile=None, ecdf_color=None, ecdf_marker='o', percentile_color='black', percentile_linestyle='--') Plots an Empirical Cumulative Distribution Function Parameters x : array or list, shape=[n_samples,] Array-like object containing the feature values y_label : str (default='ECDF') Text label for the y-axis x_label : str (default=None) Text label for the x-axis ax : matplotlib.axes.Axes (default: None) An existing matplotlib Axes. Creates one if ax=None percentile : float (default=None) Float between 0 and 1 for plotting a percentile threshold line ecdf_color : matplotlib color (default=None) Color for the ECDF plot; uses matplotlib defaults if None ecdf_marker : matplotlib marker (default='o') Marker style for the ECDF plot percentile_color : matplotlib color (default='black') Color for the percentile threshold if percentile is not None percentile_linestyle : matplotlib linestyle (default='--') Line style for the percentile threshold if percentile is not None Returns ax : matplotlib.axes.Axes object percentile_threshold : float Feature threshold at the percentile or None if percentile=None percentile_count : Number of if percentile is not None Number of samples that have a feature less or equal than the feature threshold at a percentile threshold or None if percentile=None Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/plotting/ecdf/","title":"ecdf"},{"location":"api_modules/mlxtend.plotting/enrichment_plot/","text":"enrichment_plot enrichment_plot(df, colors='bgrkcy', markers=' ', linestyles='-', alpha=0.5, lw=2, where='post', grid=True, count_label='Count', xlim='auto', ylim='auto', invert_axes=False, legend_loc='best', ax=None) Plot stacked barplots Parameters df : pandas.DataFrame A pandas DataFrame where columns represent the different categories. colors: str (default: 'bgrcky') The colors of the bars. markers : str (default: ' ') Matplotlib markerstyles, e.g, 'sov' for square,circle, and triangle markers. linestyles : str (default: '-') Matplotlib linestyles, e.g., '-,--' to cycle normal and dashed lines. Note that the different linestyles need to be separated by commas. alpha : float (default: 0.5) Transparency level from 0.0 to 1.0. lw : int or float (default: 2) Linewidth parameter. where : {'post', 'pre', 'mid'} (default: 'post') Starting location of the steps. grid : bool (default: True ) Plots a grid if True. count_label : str (default: 'Count') Label for the \"Count\"-axis. xlim : 'auto' or array-like [min, max] (default: 'auto') Min and maximum position of the x-axis range. ylim : 'auto' or array-like [min, max] (default: 'auto') Min and maximum position of the y-axis range. invert_axes : bool (default: False) Plots count on the x-axis if True. legend_loc : str (default: 'best') Location of the plot legend {best, upper left, upper right, lower left, lower right} No legend if legend_loc=False ax : matplotlib axis, optional (default: None) Use this axis for plotting or make a new one otherwise Returns ax : matplotlib axis Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/plotting/enrichment_plot/","title":"Enrichment plot"},{"location":"api_modules/mlxtend.plotting/enrichment_plot/#enrichment_plot","text":"enrichment_plot(df, colors='bgrkcy', markers=' ', linestyles='-', alpha=0.5, lw=2, where='post', grid=True, count_label='Count', xlim='auto', ylim='auto', invert_axes=False, legend_loc='best', ax=None) Plot stacked barplots Parameters df : pandas.DataFrame A pandas DataFrame where columns represent the different categories. colors: str (default: 'bgrcky') The colors of the bars. markers : str (default: ' ') Matplotlib markerstyles, e.g, 'sov' for square,circle, and triangle markers. linestyles : str (default: '-') Matplotlib linestyles, e.g., '-,--' to cycle normal and dashed lines. Note that the different linestyles need to be separated by commas. alpha : float (default: 0.5) Transparency level from 0.0 to 1.0. lw : int or float (default: 2) Linewidth parameter. where : {'post', 'pre', 'mid'} (default: 'post') Starting location of the steps. grid : bool (default: True ) Plots a grid if True. count_label : str (default: 'Count') Label for the \"Count\"-axis. xlim : 'auto' or array-like [min, max] (default: 'auto') Min and maximum position of the x-axis range. ylim : 'auto' or array-like [min, max] (default: 'auto') Min and maximum position of the y-axis range. invert_axes : bool (default: False) Plots count on the x-axis if True. legend_loc : str (default: 'best') Location of the plot legend {best, upper left, upper right, lower left, lower right} No legend if legend_loc=False ax : matplotlib axis, optional (default: None) Use this axis for plotting or make a new one otherwise Returns ax : matplotlib axis Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/plotting/enrichment_plot/","title":"enrichment_plot"},{"location":"api_modules/mlxtend.plotting/plot_confusion_matrix/","text":"plot_confusion_matrix plot_confusion_matrix(conf_mat, hide_spines=False, hide_ticks=False, figsize=None, cmap=None, colorbar=False, show_absolute=True, show_normed=False) Plot a confusion matrix via matplotlib. Parameters conf_mat : array-like, shape = [n_classes, n_classes] Confusion matrix from evaluate.confusion matrix. hide_spines : bool (default: False) Hides axis spines if True. hide_ticks : bool (default: False) Hides axis ticks if True figsize : tuple (default: (2.5, 2.5)) Height and width of the figure cmap : matplotlib colormap (default: None ) Uses matplotlib.pyplot.cm.Blues if None colorbar : bool (default: False) Shows a colorbar if True show_absolute : bool (default: True) Shows absolute confusion matrix coefficients if True. At least one of show_absolute or show_normed must be True. show_normed : bool (default: False) Shows normed confusion matrix coefficients if True. The normed confusion matrix coefficients give the proportion of training examples per class that are assigned the correct label. At least one of show_absolute or show_normed must be True. Returns fig, ax : matplotlib.pyplot subplot objects Figure and axis elements of the subplot. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/plotting/plot_confusion_matrix/","title":"Plot confusion matrix"},{"location":"api_modules/mlxtend.plotting/plot_confusion_matrix/#plot_confusion_matrix","text":"plot_confusion_matrix(conf_mat, hide_spines=False, hide_ticks=False, figsize=None, cmap=None, colorbar=False, show_absolute=True, show_normed=False) Plot a confusion matrix via matplotlib. Parameters conf_mat : array-like, shape = [n_classes, n_classes] Confusion matrix from evaluate.confusion matrix. hide_spines : bool (default: False) Hides axis spines if True. hide_ticks : bool (default: False) Hides axis ticks if True figsize : tuple (default: (2.5, 2.5)) Height and width of the figure cmap : matplotlib colormap (default: None ) Uses matplotlib.pyplot.cm.Blues if None colorbar : bool (default: False) Shows a colorbar if True show_absolute : bool (default: True) Shows absolute confusion matrix coefficients if True. At least one of show_absolute or show_normed must be True. show_normed : bool (default: False) Shows normed confusion matrix coefficients if True. The normed confusion matrix coefficients give the proportion of training examples per class that are assigned the correct label. At least one of show_absolute or show_normed must be True. Returns fig, ax : matplotlib.pyplot subplot objects Figure and axis elements of the subplot. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/plotting/plot_confusion_matrix/","title":"plot_confusion_matrix"},{"location":"api_modules/mlxtend.plotting/plot_decision_regions/","text":"plot_decision_regions plot_decision_regions(X, y, clf, feature_index=None, filler_feature_values=None, filler_feature_ranges=None, ax=None, X_highlight=None, res=None, legend=1, hide_spines=True, markers='s^oxv<>', colors='#1f77b4,#ff7f0e,#3ca02c,#d62728,#9467bd,#8c564b,#e377c2,#7f7f7f,#bcbd22,#17becf', scatter_kwargs=None, contourf_kwargs=None, scatter_highlight_kwargs=None) Plot decision regions of a classifier. Please note that this functions assumes that class labels are labeled consecutively, e.g,. 0, 1, 2, 3, 4, and 5. If you have class labels with integer labels > 4, you may want to provide additional colors and/or markers as colors and markers arguments. See http://matplotlib.org/examples/color/named_colors.html for more information. Parameters X : array-like, shape = [n_samples, n_features] Feature Matrix. y : array-like, shape = [n_samples] True class labels. clf : Classifier object. Must have a .predict method. feature_index : array-like (default: (0,) for 1D, (0, 1) otherwise) Feature indices to use for plotting. The first index in feature_index will be on the x-axis, the second index will be on the y-axis. filler_feature_values : dict (default: None) Only needed for number features > 2. Dictionary of feature index-value pairs for the features not being plotted. filler_feature_ranges : dict (default: None) Only needed for number features > 2. Dictionary of feature index-value pairs for the features not being plotted. Will use the ranges provided to select training samples for plotting. ax : matplotlib.axes.Axes (default: None) An existing matplotlib Axes. Creates one if ax=None. X_highlight : array-like, shape = [n_samples, n_features] (default: None) An array with data points that are used to highlight samples in X . res : float or array-like, shape = (2,) (default: None) This parameter was used to define the grid width, but it has been deprecated in favor of determining the number of points given the figure DPI and size automatically for optimal results and computational efficiency. To increase the resolution, it's is recommended to use to provide a dpi argument via matplotlib, e.g., plt.figure(dpi=600)`. hide_spines : bool (default: True) Hide axis spines if True. legend : int (default: 1) Integer to specify the legend location. No legend if legend is 0. markers : str (default: 's^oxv<>') Scatterplot markers. colors : str (default: 'red,blue,limegreen,gray,cyan') Comma separated list of colors. scatter_kwargs : dict (default: None) Keyword arguments for underlying matplotlib scatter function. contourf_kwargs : dict (default: None) Keyword arguments for underlying matplotlib contourf function. scatter_highlight_kwargs : dict (default: None) Keyword arguments for underlying matplotlib scatter function. Returns ax : matplotlib.axes.Axes object Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/plotting/plot_decision_regions/","title":"Plot decision regions"},{"location":"api_modules/mlxtend.plotting/plot_decision_regions/#plot_decision_regions","text":"plot_decision_regions(X, y, clf, feature_index=None, filler_feature_values=None, filler_feature_ranges=None, ax=None, X_highlight=None, res=None, legend=1, hide_spines=True, markers='s^oxv<>', colors='#1f77b4,#ff7f0e,#3ca02c,#d62728,#9467bd,#8c564b,#e377c2,#7f7f7f,#bcbd22,#17becf', scatter_kwargs=None, contourf_kwargs=None, scatter_highlight_kwargs=None) Plot decision regions of a classifier. Please note that this functions assumes that class labels are labeled consecutively, e.g,. 0, 1, 2, 3, 4, and 5. If you have class labels with integer labels > 4, you may want to provide additional colors and/or markers as colors and markers arguments. See http://matplotlib.org/examples/color/named_colors.html for more information. Parameters X : array-like, shape = [n_samples, n_features] Feature Matrix. y : array-like, shape = [n_samples] True class labels. clf : Classifier object. Must have a .predict method. feature_index : array-like (default: (0,) for 1D, (0, 1) otherwise) Feature indices to use for plotting. The first index in feature_index will be on the x-axis, the second index will be on the y-axis. filler_feature_values : dict (default: None) Only needed for number features > 2. Dictionary of feature index-value pairs for the features not being plotted. filler_feature_ranges : dict (default: None) Only needed for number features > 2. Dictionary of feature index-value pairs for the features not being plotted. Will use the ranges provided to select training samples for plotting. ax : matplotlib.axes.Axes (default: None) An existing matplotlib Axes. Creates one if ax=None. X_highlight : array-like, shape = [n_samples, n_features] (default: None) An array with data points that are used to highlight samples in X . res : float or array-like, shape = (2,) (default: None) This parameter was used to define the grid width, but it has been deprecated in favor of determining the number of points given the figure DPI and size automatically for optimal results and computational efficiency. To increase the resolution, it's is recommended to use to provide a dpi argument via matplotlib, e.g., plt.figure(dpi=600)`. hide_spines : bool (default: True) Hide axis spines if True. legend : int (default: 1) Integer to specify the legend location. No legend if legend is 0. markers : str (default: 's^oxv<>') Scatterplot markers. colors : str (default: 'red,blue,limegreen,gray,cyan') Comma separated list of colors. scatter_kwargs : dict (default: None) Keyword arguments for underlying matplotlib scatter function. contourf_kwargs : dict (default: None) Keyword arguments for underlying matplotlib contourf function. scatter_highlight_kwargs : dict (default: None) Keyword arguments for underlying matplotlib scatter function. Returns ax : matplotlib.axes.Axes object Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/plotting/plot_decision_regions/","title":"plot_decision_regions"},{"location":"api_modules/mlxtend.plotting/plot_learning_curves/","text":"plot_learning_curves plot_learning_curves(X_train, y_train, X_test, y_test, clf, train_marker='o', test_marker='^', scoring='misclassification error', suppress_plot=False, print_model=True, style='fivethirtyeight', legend_loc='best') Plots learning curves of a classifier. Parameters X_train : array-like, shape = [n_samples, n_features] Feature matrix of the training dataset. y_train : array-like, shape = [n_samples] True class labels of the training dataset. X_test : array-like, shape = [n_samples, n_features] Feature matrix of the test dataset. y_test : array-like, shape = [n_samples] True class labels of the test dataset. clf : Classifier object. Must have a .predict .fit method. train_marker : str (default: 'o') Marker for the training set line plot. test_marker : str (default: '^') Marker for the test set line plot. scoring : str (default: 'misclassification error') If not 'misclassification error', accepts the following metrics (from scikit-learn): {'accuracy', 'average_precision', 'f1_micro', 'f1_macro', 'f1_weighted', 'f1_samples', 'log_loss', 'precision', 'recall', 'roc_auc', 'adjusted_rand_score', 'mean_absolute_error', 'mean_squared_error', 'median_absolute_error', 'r2'} suppress_plot=False : bool (default: False) Suppress matplotlib plots if True. Recommended for testing purposes. print_model : bool (default: True) Print model parameters in plot title if True. style : str (default: 'fivethirtyeight') Matplotlib style legend_loc : str (default: 'best') Where to place the plot legend: {'best', 'upper left', 'upper right', 'lower left', 'lower right'} Returns errors : (training_error, test_error): tuple of lists Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/plotting/plot_learning_curves/","title":"Plot learning curves"},{"location":"api_modules/mlxtend.plotting/plot_learning_curves/#plot_learning_curves","text":"plot_learning_curves(X_train, y_train, X_test, y_test, clf, train_marker='o', test_marker='^', scoring='misclassification error', suppress_plot=False, print_model=True, style='fivethirtyeight', legend_loc='best') Plots learning curves of a classifier. Parameters X_train : array-like, shape = [n_samples, n_features] Feature matrix of the training dataset. y_train : array-like, shape = [n_samples] True class labels of the training dataset. X_test : array-like, shape = [n_samples, n_features] Feature matrix of the test dataset. y_test : array-like, shape = [n_samples] True class labels of the test dataset. clf : Classifier object. Must have a .predict .fit method. train_marker : str (default: 'o') Marker for the training set line plot. test_marker : str (default: '^') Marker for the test set line plot. scoring : str (default: 'misclassification error') If not 'misclassification error', accepts the following metrics (from scikit-learn): {'accuracy', 'average_precision', 'f1_micro', 'f1_macro', 'f1_weighted', 'f1_samples', 'log_loss', 'precision', 'recall', 'roc_auc', 'adjusted_rand_score', 'mean_absolute_error', 'mean_squared_error', 'median_absolute_error', 'r2'} suppress_plot=False : bool (default: False) Suppress matplotlib plots if True. Recommended for testing purposes. print_model : bool (default: True) Print model parameters in plot title if True. style : str (default: 'fivethirtyeight') Matplotlib style legend_loc : str (default: 'best') Where to place the plot legend: {'best', 'upper left', 'upper right', 'lower left', 'lower right'} Returns errors : (training_error, test_error): tuple of lists Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/plotting/plot_learning_curves/","title":"plot_learning_curves"},{"location":"api_modules/mlxtend.plotting/plot_linear_regression/","text":"plot_linear_regression plot_linear_regression(X, y, model=LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False), corr_func='pearsonr', scattercolor='blue', fit_style='k--', legend=True, xlim='auto') Plot a linear regression line fit. Parameters X : numpy array, shape = [n_samples,] Samples. y : numpy array, shape (n_samples,) Target values model: object (default: sklearn.linear_model.LinearRegression) Estimator object for regression. Must implement a .fit() and .predict() method. corr_func: str or function (default: 'pearsonr') Uses pearsonr from scipy.stats if corr_func='pearsonr'. to compute the regression slope. If not 'pearsonr', the corr_func , the corr_func parameter expects a function of the form func( , ) as inputs, which is expected to return a tuple (, ) . scattercolor: string (default: blue) Color of scatter plot points. fit_style: string (default: k--) Style for the line fit. legend: bool (default: True) Plots legend with corr_coeff coef., fit coef., and intercept values. xlim: array-like (x_min, x_max) or 'auto' (default: 'auto') X-axis limits for the linear line fit. Returns regression_fit : tuple intercept, slope, corr_coeff (float, float, float) Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/plotting/plot_linear_regression/","title":"Plot linear regression"},{"location":"api_modules/mlxtend.plotting/plot_linear_regression/#plot_linear_regression","text":"plot_linear_regression(X, y, model=LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False), corr_func='pearsonr', scattercolor='blue', fit_style='k--', legend=True, xlim='auto') Plot a linear regression line fit. Parameters X : numpy array, shape = [n_samples,] Samples. y : numpy array, shape (n_samples,) Target values model: object (default: sklearn.linear_model.LinearRegression) Estimator object for regression. Must implement a .fit() and .predict() method. corr_func: str or function (default: 'pearsonr') Uses pearsonr from scipy.stats if corr_func='pearsonr'. to compute the regression slope. If not 'pearsonr', the corr_func , the corr_func parameter expects a function of the form func( , ) as inputs, which is expected to return a tuple (, ) . scattercolor: string (default: blue) Color of scatter plot points. fit_style: string (default: k--) Style for the line fit. legend: bool (default: True) Plots legend with corr_coeff coef., fit coef., and intercept values. xlim: array-like (x_min, x_max) or 'auto' (default: 'auto') X-axis limits for the linear line fit. Returns regression_fit : tuple intercept, slope, corr_coeff (float, float, float) Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/plotting/plot_linear_regression/","title":"plot_linear_regression"},{"location":"api_modules/mlxtend.plotting/plot_sequential_feature_selection/","text":"plot_sequential_feature_selection plot_sequential_feature_selection(metric_dict, kind='std_dev', color='blue', bcolor='steelblue', marker='o', alpha=0.2, ylabel='Performance', confidence_interval=0.95) Plot feature selection results. Parameters metric_dict : mlxtend.SequentialFeatureSelector.get_metric_dict() object kind : str (default: \"std_dev\") The kind of error bar or confidence interval in {'std_dev', 'std_err', 'ci', None}. color : str (default: \"blue\") Color of the lineplot (accepts any matplotlib color name) bcolor : str (default: \"steelblue\"). Color of the error bars / confidence intervals (accepts any matplotlib color name). marker : str (default: \"o\") Marker of the line plot (accepts any matplotlib marker name). alpha : float in [0, 1] (default: 0.2) Transparency of the error bars / confidence intervals. ylabel : str (default: \"Performance\") Y-axis label. confidence_interval : float (default: 0.95) Confidence level if kind='ci' . Returns fig : matplotlib.pyplot.figure() object Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/plotting/plot_sequential_feature_selection/","title":"Plot sequential feature selection"},{"location":"api_modules/mlxtend.plotting/plot_sequential_feature_selection/#plot_sequential_feature_selection","text":"plot_sequential_feature_selection(metric_dict, kind='std_dev', color='blue', bcolor='steelblue', marker='o', alpha=0.2, ylabel='Performance', confidence_interval=0.95) Plot feature selection results. Parameters metric_dict : mlxtend.SequentialFeatureSelector.get_metric_dict() object kind : str (default: \"std_dev\") The kind of error bar or confidence interval in {'std_dev', 'std_err', 'ci', None}. color : str (default: \"blue\") Color of the lineplot (accepts any matplotlib color name) bcolor : str (default: \"steelblue\"). Color of the error bars / confidence intervals (accepts any matplotlib color name). marker : str (default: \"o\") Marker of the line plot (accepts any matplotlib marker name). alpha : float in [0, 1] (default: 0.2) Transparency of the error bars / confidence intervals. ylabel : str (default: \"Performance\") Y-axis label. confidence_interval : float (default: 0.95) Confidence level if kind='ci' . Returns fig : matplotlib.pyplot.figure() object Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/plotting/plot_sequential_feature_selection/","title":"plot_sequential_feature_selection"},{"location":"api_modules/mlxtend.plotting/remove_borders/","text":"remove_borders remove_borders(axes, left=False, bottom=False, right=True, top=True) Remove chart junk from matplotlib plots. Parameters axes : iterable An iterable containing plt.gca() or plt.subplot() objects, e.g. [plt.gca()]. left : bool (default: False ) Hide left axis spine if True. bottom : bool (default: False ) Hide bottom axis spine if True. right : bool (default: True ) Hide right axis spine if True. top : bool (default: True ) Hide top axis spine if True. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/plotting/remove_chartjunk/","title":"Remove borders"},{"location":"api_modules/mlxtend.plotting/remove_borders/#remove_borders","text":"remove_borders(axes, left=False, bottom=False, right=True, top=True) Remove chart junk from matplotlib plots. Parameters axes : iterable An iterable containing plt.gca() or plt.subplot() objects, e.g. [plt.gca()]. left : bool (default: False ) Hide left axis spine if True. bottom : bool (default: False ) Hide bottom axis spine if True. right : bool (default: True ) Hide right axis spine if True. top : bool (default: True ) Hide top axis spine if True. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/plotting/remove_chartjunk/","title":"remove_borders"},{"location":"api_modules/mlxtend.plotting/scatterplotmatrix/","text":"scatterplotmatrix scatterplotmatrix(X, fig_axes=None, names=None, figsize=(8, 8), alpha=1.0, kwargs) Lower triangular of a scatterplot matrix Parameters X : array-like, shape={num_examples, num_features} Design matrix containing data instances (examples) with multiple exploratory variables (features). fix_axes : tuple (default: None) A (fig, axes) tuple, where fig is an figure object and axes is an axes object created via matplotlib, for example, by calling the pyplot subplot function fig, axes = plt.subplots(...) names : list (default: None) A list of string names, which should have the same number of elements as there are features (columns) in X . figsize : tuple (default: (8, 8)) Height and width of the subplot grid. Ignored if fig_axes is not None . alpha : float (default: 1.0) Transparency for both the scatter plots and the histograms along the diagonal. **kwargs : kwargs Keyword arguments for the scatterplots. Returns fix_axes : tuple A (fig, axes) tuple, where fig is an figure object and axes is an axes object created via matplotlib, for example, by calling the pyplot subplot function fig, axes = plt.subplots(...)","title":"Scatterplotmatrix"},{"location":"api_modules/mlxtend.plotting/scatterplotmatrix/#scatterplotmatrix","text":"scatterplotmatrix(X, fig_axes=None, names=None, figsize=(8, 8), alpha=1.0, kwargs) Lower triangular of a scatterplot matrix Parameters X : array-like, shape={num_examples, num_features} Design matrix containing data instances (examples) with multiple exploratory variables (features). fix_axes : tuple (default: None) A (fig, axes) tuple, where fig is an figure object and axes is an axes object created via matplotlib, for example, by calling the pyplot subplot function fig, axes = plt.subplots(...) names : list (default: None) A list of string names, which should have the same number of elements as there are features (columns) in X . figsize : tuple (default: (8, 8)) Height and width of the subplot grid. Ignored if fig_axes is not None . alpha : float (default: 1.0) Transparency for both the scatter plots and the histograms along the diagonal. **kwargs : kwargs Keyword arguments for the scatterplots. Returns fix_axes : tuple A (fig, axes) tuple, where fig is an figure object and axes is an axes object created via matplotlib, for example, by calling the pyplot subplot function fig, axes = plt.subplots(...)","title":"scatterplotmatrix"},{"location":"api_modules/mlxtend.plotting/stacked_barplot/","text":"stacked_barplot stacked_barplot(df, bar_width='auto', colors='bgrcky', labels='index', rotation=90, legend_loc='best') Function to plot stacked barplots Parameters df : pandas.DataFrame A pandas DataFrame where the index denotes the x-axis labels, and the columns contain the different measurements for each row. bar_width: 'auto' or float (default: 'auto') Parameter to set the widths of the bars. if 'auto', the width is automatically determined by the number of columns in the dataset. colors: str (default: 'bgrcky') The colors of the bars. labels: 'index' or iterable (default: 'index') If 'index', the DataFrame index will be used as x-tick labels. rotation: int (default: 90) Parameter to rotate the x-axis labels. legend_loc : str (default: 'best') Location of the plot legend {best, upper left, upper right, lower left, lower right} No legend if legend_loc=False Returns fig : matplotlib.pyplot figure object Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/plotting/stacked_barplot/","title":"Stacked barplot"},{"location":"api_modules/mlxtend.plotting/stacked_barplot/#stacked_barplot","text":"stacked_barplot(df, bar_width='auto', colors='bgrcky', labels='index', rotation=90, legend_loc='best') Function to plot stacked barplots Parameters df : pandas.DataFrame A pandas DataFrame where the index denotes the x-axis labels, and the columns contain the different measurements for each row. bar_width: 'auto' or float (default: 'auto') Parameter to set the widths of the bars. if 'auto', the width is automatically determined by the number of columns in the dataset. colors: str (default: 'bgrcky') The colors of the bars. labels: 'index' or iterable (default: 'index') If 'index', the DataFrame index will be used as x-tick labels. rotation: int (default: 90) Parameter to rotate the x-axis labels. legend_loc : str (default: 'best') Location of the plot legend {best, upper left, upper right, lower left, lower right} No legend if legend_loc=False Returns fig : matplotlib.pyplot figure object Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/plotting/stacked_barplot/","title":"stacked_barplot"},{"location":"api_modules/mlxtend.preprocessing/CopyTransformer/","text":"CopyTransformer CopyTransformer() Transformer that returns a copy of the input array For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/preprocessing/CopyTransformer/ Methods fit(X, y=None) Mock method. Does nothing. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] (default: None) Returns self fit_transform(X, y=None) Return a copy of the input array. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] (default: None) Returns X_copy : copy of the input X array. get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self transform(X, y=None) Return a copy of the input array. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] (default: None) Returns X_copy : copy of the input X array.","title":"CopyTransformer"},{"location":"api_modules/mlxtend.preprocessing/CopyTransformer/#copytransformer","text":"CopyTransformer() Transformer that returns a copy of the input array For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/preprocessing/CopyTransformer/","title":"CopyTransformer"},{"location":"api_modules/mlxtend.preprocessing/CopyTransformer/#methods","text":"fit(X, y=None) Mock method. Does nothing. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] (default: None) Returns self fit_transform(X, y=None) Return a copy of the input array. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] (default: None) Returns X_copy : copy of the input X array. get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self transform(X, y=None) Return a copy of the input array. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] (default: None) Returns X_copy : copy of the input X array.","title":"Methods"},{"location":"api_modules/mlxtend.preprocessing/DenseTransformer/","text":"DenseTransformer DenseTransformer(return_copy=True) Convert a sparse array into a dense array. For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/preprocessing/DenseTransformer/ Methods fit(X, y=None) Mock method. Does nothing. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] (default: None) Returns self fit_transform(X, y=None) Return a dense version of the input array. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] (default: None) Returns X_dense : dense version of the input X array. get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self transform(X, y=None) Return a dense version of the input array. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] (default: None) Returns X_dense : dense version of the input X array.","title":"DenseTransformer"},{"location":"api_modules/mlxtend.preprocessing/DenseTransformer/#densetransformer","text":"DenseTransformer(return_copy=True) Convert a sparse array into a dense array. For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/preprocessing/DenseTransformer/","title":"DenseTransformer"},{"location":"api_modules/mlxtend.preprocessing/DenseTransformer/#methods","text":"fit(X, y=None) Mock method. Does nothing. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] (default: None) Returns self fit_transform(X, y=None) Return a dense version of the input array. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] (default: None) Returns X_dense : dense version of the input X array. get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self transform(X, y=None) Return a dense version of the input array. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] (default: None) Returns X_dense : dense version of the input X array.","title":"Methods"},{"location":"api_modules/mlxtend.preprocessing/MeanCenterer/","text":"MeanCenterer MeanCenterer() Column centering of vectors and matrices. Attributes col_means : numpy.ndarray [n_columns] NumPy array storing the mean values for centering after fitting the MeanCenterer object. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/preprocessing/MeanCenterer/ Methods fit(X) Gets the column means for mean centering. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Array of data vectors, where n_samples is the number of samples and n_features is the number of features. Returns self fit_transform(X) Fits and transforms an arry. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Array of data vectors, where n_samples is the number of samples and n_features is the number of features. Returns X_tr : {array-like, sparse matrix}, shape = [n_samples, n_features] A copy of the input array with the columns centered. transform(X) Centers a NumPy array. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Array of data vectors, where n_samples is the number of samples and n_features is the number of features. Returns X_tr : {array-like, sparse matrix}, shape = [n_samples, n_features] A copy of the input array with the columns centered.","title":"MeanCenterer"},{"location":"api_modules/mlxtend.preprocessing/MeanCenterer/#meancenterer","text":"MeanCenterer() Column centering of vectors and matrices. Attributes col_means : numpy.ndarray [n_columns] NumPy array storing the mean values for centering after fitting the MeanCenterer object. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/preprocessing/MeanCenterer/","title":"MeanCenterer"},{"location":"api_modules/mlxtend.preprocessing/MeanCenterer/#methods","text":"fit(X) Gets the column means for mean centering. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Array of data vectors, where n_samples is the number of samples and n_features is the number of features. Returns self fit_transform(X) Fits and transforms an arry. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Array of data vectors, where n_samples is the number of samples and n_features is the number of features. Returns X_tr : {array-like, sparse matrix}, shape = [n_samples, n_features] A copy of the input array with the columns centered. transform(X) Centers a NumPy array. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Array of data vectors, where n_samples is the number of samples and n_features is the number of features. Returns X_tr : {array-like, sparse matrix}, shape = [n_samples, n_features] A copy of the input array with the columns centered.","title":"Methods"},{"location":"api_modules/mlxtend.preprocessing/OnehotTransactions/","text":"OnehotTransactions OnehotTransactions( args, * kwargs) Encoder class for transaction data in Python lists Parameters None Attributes columns_: list List of unique names in the X input list of lists Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/preprocessing/TransactionEncoder/ Methods fit(X) Learn unique column names from transaction DataFrame Parameters X : list of lists A python list of lists, where the outer list stores the n transactions and the inner list stores the items in each transaction. For example, [['Apple', 'Beer', 'Rice', 'Chicken'], ['Apple', 'Beer', 'Rice'], ['Apple', 'Beer'], ['Apple', 'Bananas'], ['Milk', 'Beer', 'Rice', 'Chicken'], ['Milk', 'Beer', 'Rice'], ['Milk', 'Beer'], ['Apple', 'Bananas']] fit_transform(X, sparse=False) Fit a TransactionEncoder encoder and transform a dataset. get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values. inverse_transform(array) Transforms an encoded NumPy array back into transactions. Parameters array : NumPy array [n_transactions, n_unique_items] The NumPy one-hot encoded boolean array of the input transactions, where the columns represent the unique items found in the input array in alphabetic order For example, array([[True , False, True , True , False, True ], [True , False, True , False, False, True ], [True , False, True , False, False, False], [True , True , False, False, False, False], [False, False, True , True , True , True ], [False, False, True , False, True , True ], [False, False, True , False, True , False], [True , True , False, False, False, False]]) The corresponding column labels are available as self.columns_, e.g., ['Apple', 'Bananas', 'Beer', 'Chicken', 'Milk', 'Rice'] Returns X : list of lists A python list of lists, where the outer list stores the n transactions and the inner list stores the items in each transaction. For example, [['Apple', 'Beer', 'Rice', 'Chicken'], ['Apple', 'Beer', 'Rice'], ['Apple', 'Beer'], ['Apple', 'Bananas'], ['Milk', 'Beer', 'Rice', 'Chicken'], ['Milk', 'Beer', 'Rice'], ['Milk', 'Beer'], ['Apple', 'Bananas']] set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self transform(X, sparse=False) Transform transactions into a one-hot encoded NumPy array. Parameters X : list of lists A python list of lists, where the outer list stores the n transactions and the inner list stores the items in each transaction. For example, [['Apple', 'Beer', 'Rice', 'Chicken'], ['Apple', 'Beer', 'Rice'], ['Apple', 'Beer'], ['Apple', 'Bananas'], ['Milk', 'Beer', 'Rice', 'Chicken'], ['Milk', 'Beer', 'Rice'], ['Milk', 'Beer'], ['Apple', 'Bananas']] sparse: bool (default=False) If True, transform will return Compressed Sparse Row matrix instead of the regular one. Returns array : NumPy array [n_transactions, n_unique_items] if sparse=False (default). Compressed Sparse Row matrix otherwise The one-hot encoded boolean array of the input transactions, where the columns represent the unique items found in the input array in alphabetic order. Exact representation depends on the sparse argument For example, array([[True , False, True , True , False, True ], [True , False, True , False, False, True ], [True , False, True , False, False, False], [True , True , False, False, False, False], [False, False, True , True , True , True ], [False, False, True , False, True , True ], [False, False, True , False, True , False], [True , True , False, False, False, False]]) The corresponding column labels are available as self.columns_, e.g., ['Apple', 'Bananas', 'Beer', 'Chicken', 'Milk', 'Rice']","title":"OnehotTransactions"},{"location":"api_modules/mlxtend.preprocessing/OnehotTransactions/#onehottransactions","text":"OnehotTransactions( args, * kwargs) Encoder class for transaction data in Python lists Parameters None Attributes columns_: list List of unique names in the X input list of lists Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/preprocessing/TransactionEncoder/","title":"OnehotTransactions"},{"location":"api_modules/mlxtend.preprocessing/OnehotTransactions/#methods","text":"fit(X) Learn unique column names from transaction DataFrame Parameters X : list of lists A python list of lists, where the outer list stores the n transactions and the inner list stores the items in each transaction. For example, [['Apple', 'Beer', 'Rice', 'Chicken'], ['Apple', 'Beer', 'Rice'], ['Apple', 'Beer'], ['Apple', 'Bananas'], ['Milk', 'Beer', 'Rice', 'Chicken'], ['Milk', 'Beer', 'Rice'], ['Milk', 'Beer'], ['Apple', 'Bananas']] fit_transform(X, sparse=False) Fit a TransactionEncoder encoder and transform a dataset. get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values. inverse_transform(array) Transforms an encoded NumPy array back into transactions. Parameters array : NumPy array [n_transactions, n_unique_items] The NumPy one-hot encoded boolean array of the input transactions, where the columns represent the unique items found in the input array in alphabetic order For example, array([[True , False, True , True , False, True ], [True , False, True , False, False, True ], [True , False, True , False, False, False], [True , True , False, False, False, False], [False, False, True , True , True , True ], [False, False, True , False, True , True ], [False, False, True , False, True , False], [True , True , False, False, False, False]]) The corresponding column labels are available as self.columns_, e.g., ['Apple', 'Bananas', 'Beer', 'Chicken', 'Milk', 'Rice'] Returns X : list of lists A python list of lists, where the outer list stores the n transactions and the inner list stores the items in each transaction. For example, [['Apple', 'Beer', 'Rice', 'Chicken'], ['Apple', 'Beer', 'Rice'], ['Apple', 'Beer'], ['Apple', 'Bananas'], ['Milk', 'Beer', 'Rice', 'Chicken'], ['Milk', 'Beer', 'Rice'], ['Milk', 'Beer'], ['Apple', 'Bananas']] set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self transform(X, sparse=False) Transform transactions into a one-hot encoded NumPy array. Parameters X : list of lists A python list of lists, where the outer list stores the n transactions and the inner list stores the items in each transaction. For example, [['Apple', 'Beer', 'Rice', 'Chicken'], ['Apple', 'Beer', 'Rice'], ['Apple', 'Beer'], ['Apple', 'Bananas'], ['Milk', 'Beer', 'Rice', 'Chicken'], ['Milk', 'Beer', 'Rice'], ['Milk', 'Beer'], ['Apple', 'Bananas']] sparse: bool (default=False) If True, transform will return Compressed Sparse Row matrix instead of the regular one. Returns array : NumPy array [n_transactions, n_unique_items] if sparse=False (default). Compressed Sparse Row matrix otherwise The one-hot encoded boolean array of the input transactions, where the columns represent the unique items found in the input array in alphabetic order. Exact representation depends on the sparse argument For example, array([[True , False, True , True , False, True ], [True , False, True , False, False, True ], [True , False, True , False, False, False], [True , True , False, False, False, False], [False, False, True , True , True , True ], [False, False, True , False, True , True ], [False, False, True , False, True , False], [True , True , False, False, False, False]]) The corresponding column labels are available as self.columns_, e.g., ['Apple', 'Bananas', 'Beer', 'Chicken', 'Milk', 'Rice']","title":"Methods"},{"location":"api_modules/mlxtend.preprocessing/TransactionEncoder/","text":"TransactionEncoder TransactionEncoder() Encoder class for transaction data in Python lists Parameters None Attributes columns_: list List of unique names in the X input list of lists Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/preprocessing/TransactionEncoder/ Methods fit(X) Learn unique column names from transaction DataFrame Parameters X : list of lists A python list of lists, where the outer list stores the n transactions and the inner list stores the items in each transaction. For example, [['Apple', 'Beer', 'Rice', 'Chicken'], ['Apple', 'Beer', 'Rice'], ['Apple', 'Beer'], ['Apple', 'Bananas'], ['Milk', 'Beer', 'Rice', 'Chicken'], ['Milk', 'Beer', 'Rice'], ['Milk', 'Beer'], ['Apple', 'Bananas']] fit_transform(X, sparse=False) Fit a TransactionEncoder encoder and transform a dataset. get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values. inverse_transform(array) Transforms an encoded NumPy array back into transactions. Parameters array : NumPy array [n_transactions, n_unique_items] The NumPy one-hot encoded boolean array of the input transactions, where the columns represent the unique items found in the input array in alphabetic order For example, array([[True , False, True , True , False, True ], [True , False, True , False, False, True ], [True , False, True , False, False, False], [True , True , False, False, False, False], [False, False, True , True , True , True ], [False, False, True , False, True , True ], [False, False, True , False, True , False], [True , True , False, False, False, False]]) The corresponding column labels are available as self.columns_, e.g., ['Apple', 'Bananas', 'Beer', 'Chicken', 'Milk', 'Rice'] Returns X : list of lists A python list of lists, where the outer list stores the n transactions and the inner list stores the items in each transaction. For example, [['Apple', 'Beer', 'Rice', 'Chicken'], ['Apple', 'Beer', 'Rice'], ['Apple', 'Beer'], ['Apple', 'Bananas'], ['Milk', 'Beer', 'Rice', 'Chicken'], ['Milk', 'Beer', 'Rice'], ['Milk', 'Beer'], ['Apple', 'Bananas']] set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self transform(X, sparse=False) Transform transactions into a one-hot encoded NumPy array. Parameters X : list of lists A python list of lists, where the outer list stores the n transactions and the inner list stores the items in each transaction. For example, [['Apple', 'Beer', 'Rice', 'Chicken'], ['Apple', 'Beer', 'Rice'], ['Apple', 'Beer'], ['Apple', 'Bananas'], ['Milk', 'Beer', 'Rice', 'Chicken'], ['Milk', 'Beer', 'Rice'], ['Milk', 'Beer'], ['Apple', 'Bananas']] sparse: bool (default=False) If True, transform will return Compressed Sparse Row matrix instead of the regular one. Returns array : NumPy array [n_transactions, n_unique_items] if sparse=False (default). Compressed Sparse Row matrix otherwise The one-hot encoded boolean array of the input transactions, where the columns represent the unique items found in the input array in alphabetic order. Exact representation depends on the sparse argument For example, array([[True , False, True , True , False, True ], [True , False, True , False, False, True ], [True , False, True , False, False, False], [True , True , False, False, False, False], [False, False, True , True , True , True ], [False, False, True , False, True , True ], [False, False, True , False, True , False], [True , True , False, False, False, False]]) The corresponding column labels are available as self.columns_, e.g., ['Apple', 'Bananas', 'Beer', 'Chicken', 'Milk', 'Rice']","title":"TransactionEncoder"},{"location":"api_modules/mlxtend.preprocessing/TransactionEncoder/#transactionencoder","text":"TransactionEncoder() Encoder class for transaction data in Python lists Parameters None Attributes columns_: list List of unique names in the X input list of lists Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/preprocessing/TransactionEncoder/","title":"TransactionEncoder"},{"location":"api_modules/mlxtend.preprocessing/TransactionEncoder/#methods","text":"fit(X) Learn unique column names from transaction DataFrame Parameters X : list of lists A python list of lists, where the outer list stores the n transactions and the inner list stores the items in each transaction. For example, [['Apple', 'Beer', 'Rice', 'Chicken'], ['Apple', 'Beer', 'Rice'], ['Apple', 'Beer'], ['Apple', 'Bananas'], ['Milk', 'Beer', 'Rice', 'Chicken'], ['Milk', 'Beer', 'Rice'], ['Milk', 'Beer'], ['Apple', 'Bananas']] fit_transform(X, sparse=False) Fit a TransactionEncoder encoder and transform a dataset. get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values. inverse_transform(array) Transforms an encoded NumPy array back into transactions. Parameters array : NumPy array [n_transactions, n_unique_items] The NumPy one-hot encoded boolean array of the input transactions, where the columns represent the unique items found in the input array in alphabetic order For example, array([[True , False, True , True , False, True ], [True , False, True , False, False, True ], [True , False, True , False, False, False], [True , True , False, False, False, False], [False, False, True , True , True , True ], [False, False, True , False, True , True ], [False, False, True , False, True , False], [True , True , False, False, False, False]]) The corresponding column labels are available as self.columns_, e.g., ['Apple', 'Bananas', 'Beer', 'Chicken', 'Milk', 'Rice'] Returns X : list of lists A python list of lists, where the outer list stores the n transactions and the inner list stores the items in each transaction. For example, [['Apple', 'Beer', 'Rice', 'Chicken'], ['Apple', 'Beer', 'Rice'], ['Apple', 'Beer'], ['Apple', 'Bananas'], ['Milk', 'Beer', 'Rice', 'Chicken'], ['Milk', 'Beer', 'Rice'], ['Milk', 'Beer'], ['Apple', 'Bananas']] set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self transform(X, sparse=False) Transform transactions into a one-hot encoded NumPy array. Parameters X : list of lists A python list of lists, where the outer list stores the n transactions and the inner list stores the items in each transaction. For example, [['Apple', 'Beer', 'Rice', 'Chicken'], ['Apple', 'Beer', 'Rice'], ['Apple', 'Beer'], ['Apple', 'Bananas'], ['Milk', 'Beer', 'Rice', 'Chicken'], ['Milk', 'Beer', 'Rice'], ['Milk', 'Beer'], ['Apple', 'Bananas']] sparse: bool (default=False) If True, transform will return Compressed Sparse Row matrix instead of the regular one. Returns array : NumPy array [n_transactions, n_unique_items] if sparse=False (default). Compressed Sparse Row matrix otherwise The one-hot encoded boolean array of the input transactions, where the columns represent the unique items found in the input array in alphabetic order. Exact representation depends on the sparse argument For example, array([[True , False, True , True , False, True ], [True , False, True , False, False, True ], [True , False, True , False, False, False], [True , True , False, False, False, False], [False, False, True , True , True , True ], [False, False, True , False, True , True ], [False, False, True , False, True , False], [True , True , False, False, False, False]]) The corresponding column labels are available as self.columns_, e.g., ['Apple', 'Bananas', 'Beer', 'Chicken', 'Milk', 'Rice']","title":"Methods"},{"location":"api_modules/mlxtend.preprocessing/minmax_scaling/","text":"minmax_scaling minmax_scaling(array, columns, min_val=0, max_val=1) Min max scaling of pandas' DataFrames. Parameters array : pandas DataFrame or NumPy ndarray, shape = [n_rows, n_columns]. columns : array-like, shape = [n_columns] Array-like with column names, e.g., ['col1', 'col2', ...] or column indices [0, 2, 4, ...] min_val : int or float , optional (default= 0 ) minimum value after rescaling. max_val : int or float , optional (default= 1 ) maximum value after rescaling. Returns df_new : pandas DataFrame object. Copy of the array or DataFrame with rescaled columns. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/preprocessing/minmax_scaling/","title":"Minmax scaling"},{"location":"api_modules/mlxtend.preprocessing/minmax_scaling/#minmax_scaling","text":"minmax_scaling(array, columns, min_val=0, max_val=1) Min max scaling of pandas' DataFrames. Parameters array : pandas DataFrame or NumPy ndarray, shape = [n_rows, n_columns]. columns : array-like, shape = [n_columns] Array-like with column names, e.g., ['col1', 'col2', ...] or column indices [0, 2, 4, ...] min_val : int or float , optional (default= 0 ) minimum value after rescaling. max_val : int or float , optional (default= 1 ) maximum value after rescaling. Returns df_new : pandas DataFrame object. Copy of the array or DataFrame with rescaled columns. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/preprocessing/minmax_scaling/","title":"minmax_scaling"},{"location":"api_modules/mlxtend.preprocessing/one_hot/","text":"one_hot one_hot(y, num_labels='auto', dtype='float') One-hot encoding of class labels Parameters y : array-like, shape = [n_classlabels] Python list or numpy array consisting of class labels. num_labels : int or 'auto' Number of unique labels in the class label array. Infers the number of unique labels from the input array if set to 'auto'. dtype : str NumPy array type (float, float32, float64) of the output array. Returns ary : numpy.ndarray, shape = [n_classlabels] One-hot encoded array, where each sample is represented as a row vector in the returned array. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/preprocessing/one_hot/","title":"One hot"},{"location":"api_modules/mlxtend.preprocessing/one_hot/#one_hot","text":"one_hot(y, num_labels='auto', dtype='float') One-hot encoding of class labels Parameters y : array-like, shape = [n_classlabels] Python list or numpy array consisting of class labels. num_labels : int or 'auto' Number of unique labels in the class label array. Infers the number of unique labels from the input array if set to 'auto'. dtype : str NumPy array type (float, float32, float64) of the output array. Returns ary : numpy.ndarray, shape = [n_classlabels] One-hot encoded array, where each sample is represented as a row vector in the returned array. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/preprocessing/one_hot/","title":"one_hot"},{"location":"api_modules/mlxtend.preprocessing/shuffle_arrays_unison/","text":"shuffle_arrays_unison shuffle_arrays_unison(arrays, random_seed=None) Shuffle NumPy arrays in unison. Parameters arrays : array-like, shape = [n_arrays] A list of NumPy arrays. random_seed : int (default: None) Sets the random state. Returns shuffled_arrays : A list of NumPy arrays after shuffling. Examples >>> import numpy as np >>> from mlxtend.preprocessing import shuffle_arrays_unison >>> X1 = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) >>> y1 = np.array([1, 2, 3]) >>> X2, y2 = shuffle_arrays_unison(arrays=[X1, y1], random_seed=3) >>> assert(X2.all() == np.array([[4, 5, 6], [1, 2, 3], [7, 8, 9]]).all()) >>> assert(y2.all() == np.array([2, 1, 3]).all()) >>> For more usage examples, please see http://rasbt.github.io/mlxtend/user_guide/preprocessing/shuffle_arrays_unison/","title":"Shuffle arrays unison"},{"location":"api_modules/mlxtend.preprocessing/shuffle_arrays_unison/#shuffle_arrays_unison","text":"shuffle_arrays_unison(arrays, random_seed=None) Shuffle NumPy arrays in unison. Parameters arrays : array-like, shape = [n_arrays] A list of NumPy arrays. random_seed : int (default: None) Sets the random state. Returns shuffled_arrays : A list of NumPy arrays after shuffling. Examples >>> import numpy as np >>> from mlxtend.preprocessing import shuffle_arrays_unison >>> X1 = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) >>> y1 = np.array([1, 2, 3]) >>> X2, y2 = shuffle_arrays_unison(arrays=[X1, y1], random_seed=3) >>> assert(X2.all() == np.array([[4, 5, 6], [1, 2, 3], [7, 8, 9]]).all()) >>> assert(y2.all() == np.array([2, 1, 3]).all()) >>> For more usage examples, please see http://rasbt.github.io/mlxtend/user_guide/preprocessing/shuffle_arrays_unison/","title":"shuffle_arrays_unison"},{"location":"api_modules/mlxtend.preprocessing/standardize/","text":"standardize standardize(array, columns=None, ddof=0, return_params=False, params=None) Standardize columns in pandas DataFrames. Parameters array : pandas DataFrame or NumPy ndarray, shape = [n_rows, n_columns]. columns : array-like, shape = [n_columns] (default: None) Array-like with column names, e.g., ['col1', 'col2', ...] or column indices [0, 2, 4, ...] If None, standardizes all columns. ddof : int (default: 0) Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements. return_params : dict (default: False) If set to True, a dictionary is returned in addition to the standardized array. The parameter dictionary contains the column means ('avgs') and standard deviations ('stds') of the individual columns. params : dict (default: None) A dictionary with column means and standard deviations as returned by the standardize function if return_params was set to True. If a params dictionary is provided, the standardize function will use these instead of computing them from the current array. Notes If all values in a given column are the same, these values are all set to 0.0 . The standard deviation in the parameters dictionary is consequently set to 1.0 to avoid dividing by zero. Returns df_new : pandas DataFrame object. Copy of the array or DataFrame with standardized columns. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/preprocessing/standardize/","title":"Standardize"},{"location":"api_modules/mlxtend.preprocessing/standardize/#standardize","text":"standardize(array, columns=None, ddof=0, return_params=False, params=None) Standardize columns in pandas DataFrames. Parameters array : pandas DataFrame or NumPy ndarray, shape = [n_rows, n_columns]. columns : array-like, shape = [n_columns] (default: None) Array-like with column names, e.g., ['col1', 'col2', ...] or column indices [0, 2, 4, ...] If None, standardizes all columns. ddof : int (default: 0) Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements. return_params : dict (default: False) If set to True, a dictionary is returned in addition to the standardized array. The parameter dictionary contains the column means ('avgs') and standard deviations ('stds') of the individual columns. params : dict (default: None) A dictionary with column means and standard deviations as returned by the standardize function if return_params was set to True. If a params dictionary is provided, the standardize function will use these instead of computing them from the current array. Notes If all values in a given column are the same, these values are all set to 0.0 . The standard deviation in the parameters dictionary is consequently set to 1.0 to avoid dividing by zero. Returns df_new : pandas DataFrame object. Copy of the array or DataFrame with standardized columns. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/preprocessing/standardize/","title":"standardize"},{"location":"api_modules/mlxtend.regressor/LinearRegression/","text":"LinearRegression LinearRegression(eta=0.01, epochs=50, minibatches=None, random_seed=None, print_progress=0) Ordinary least squares linear regression. Parameters eta : float (default: 0.01) solver rate (between 0.0 and 1.0) epochs : int (default: 50) Passes over the training dataset. Prior to each epoch, the dataset is shuffled if minibatches > 1 to prevent cycles in stochastic gradient descent. minibatches : int (default: None) The number of minibatches for gradient-based optimization. If None: Normal Equations (closed-form solution) If 1: Gradient Descent learning If len(y): Stochastic Gradient Descent learning If 1 < minibatches < len(y): Minibatch learning random_seed : int (default: None) Set random state for shuffling and initializing the weights. print_progress : int (default: 0) Prints progress in fitting to stderr if not solver='normal equation' 0: No output 1: Epochs elapsed and cost 2: 1 plus time elapsed 3: 2 plus estimated time until completion Attributes w_ : 2d-array, shape={n_features, 1} Model weights after fitting. b_ : 1d-array, shape={1,} Bias unit after fitting. cost_ : list Sum of squared errors after each epoch; ignored if solver='normal equation' Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/regressor/LinearRegression/ Methods fit(X, y, init_params=True) Learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. init_params : bool (default: True) Re-initializes model parameters prior to fitting. Set False to continue training with weights from a previous model fitting. Returns self : object get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values.' adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py Author: Gael Varoquaux gael.varoquaux@normalesup.org License: BSD 3 clause predict(X) Predict targets from X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns target_values : array-like, shape = [n_samples] Predicted target values. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py Author: Gael Varoquaux gael.varoquaux@normalesup.org License: BSD 3 clause","title":"LinearRegression"},{"location":"api_modules/mlxtend.regressor/LinearRegression/#linearregression","text":"LinearRegression(eta=0.01, epochs=50, minibatches=None, random_seed=None, print_progress=0) Ordinary least squares linear regression. Parameters eta : float (default: 0.01) solver rate (between 0.0 and 1.0) epochs : int (default: 50) Passes over the training dataset. Prior to each epoch, the dataset is shuffled if minibatches > 1 to prevent cycles in stochastic gradient descent. minibatches : int (default: None) The number of minibatches for gradient-based optimization. If None: Normal Equations (closed-form solution) If 1: Gradient Descent learning If len(y): Stochastic Gradient Descent learning If 1 < minibatches < len(y): Minibatch learning random_seed : int (default: None) Set random state for shuffling and initializing the weights. print_progress : int (default: 0) Prints progress in fitting to stderr if not solver='normal equation' 0: No output 1: Epochs elapsed and cost 2: 1 plus time elapsed 3: 2 plus estimated time until completion Attributes w_ : 2d-array, shape={n_features, 1} Model weights after fitting. b_ : 1d-array, shape={1,} Bias unit after fitting. cost_ : list Sum of squared errors after each epoch; ignored if solver='normal equation' Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/regressor/LinearRegression/","title":"LinearRegression"},{"location":"api_modules/mlxtend.regressor/LinearRegression/#methods","text":"fit(X, y, init_params=True) Learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. init_params : bool (default: True) Re-initializes model parameters prior to fitting. Set False to continue training with weights from a previous model fitting. Returns self : object get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values.' adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py","title":"Methods"},{"location":"api_modules/mlxtend.regressor/LinearRegression/#author-gael-varoquaux-amp103amp97amp101amp108amp46amp118amp97amp114amp111amp113amp117amp97amp117amp120amp64amp110amp111amp114amp109amp97amp108amp101amp115amp117amp112amp46amp111amp114amp103","text":"","title":"Author: Gael Varoquaux gael.varoquaux@normalesup.org"},{"location":"api_modules/mlxtend.regressor/LinearRegression/#license-bsd-3-clause","text":"predict(X) Predict targets from X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns target_values : array-like, shape = [n_samples] Predicted target values. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py","title":"License: BSD 3 clause"},{"location":"api_modules/mlxtend.regressor/LinearRegression/#author-gael-varoquaux-amp103amp97amp101amp108amp46amp118amp97amp114amp111amp113amp117amp97amp117amp120amp64amp110amp111amp114amp109amp97amp108amp101amp115amp117amp112amp46amp111amp114amp103_1","text":"","title":"Author: Gael Varoquaux gael.varoquaux@normalesup.org"},{"location":"api_modules/mlxtend.regressor/LinearRegression/#license-bsd-3-clause_1","text":"","title":"License: BSD 3 clause"},{"location":"api_modules/mlxtend.regressor/StackingCVRegressor/","text":"StackingCVRegressor StackingCVRegressor(regressors, meta_regressor, cv=5, shuffle=True, use_features_in_secondary=False, store_train_meta_features=False, refit=True) A 'Stacking Cross-Validation' regressor for scikit-learn estimators. New in mlxtend v0.7.0 Notes The StackingCVRegressor uses scikit-learn's check_cv internally, which doesn't support a random seed. Thus NumPy's random seed need to be specified explicitely for deterministic behavior, for instance, by setting np.random.seed(RANDOM_SEED) prior to fitting the StackingCVRegressor Parameters regressors : array-like, shape = [n_regressors] A list of regressors. Invoking the fit method on the StackingCVRegressor will fit clones of these original regressors that will be stored in the class attribute self.regr_ . meta_regressor : object The meta-regressor to be fitted on the ensemble of regressor cv : int, cross-validation generator or iterable, optional (default: 5) Determines the cross-validation splitting strategy. Possible inputs for cv are: - None, to use the default 5-fold cross validation, - integer, to specify the number of folds in a KFold , - An object to be used as a cross-validation generator. - An iterable yielding train, test splits. For integer/None inputs, it will use KFold cross-validation use_features_in_secondary : bool (default: False) If True, the meta-regressor will be trained both on the predictions of the original regressors and the original dataset. If False, the meta-regressor will be trained only on the predictions of the original regressors. shuffle : bool (default: True) If True, and the cv argument is integer, the training data will be shuffled at fitting stage prior to cross-validation. If the cv argument is a specific cross validation technique, this argument is omitted. store_train_meta_features : bool (default: False) If True, the meta-features computed from the training data used for fitting the meta-regressor stored in the self.train_meta_features_ array, which can be accessed after calling fit . refit : bool (default: True) Clones the regressors for stacking regression if True (default) or else uses the original ones, which will be refitted on the dataset upon calling the fit method. Setting refit=False is recommended if you are working with estimators that are supporting the scikit-learn fit/predict API interface but are not compatible to scikit-learn's clone function. Attributes train_meta_features : numpy array, shape = [n_samples, n_regressors] meta-features for training data, where n_samples is the number of samples in training data and len(self.regressors) is the number of regressors. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/regressor/StackingCVRegressor/ Methods fit(X, y, groups=None, sample_weight=None) Fit ensemble regressors and the meta-regressor. Parameters X : numpy array, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : numpy array, shape = [n_samples] Target values. groups : numpy array/None, shape = [n_samples] The group that each sample belongs to. This is used by specific folding strategies such as GroupKFold() sample_weight : array-like, shape = [n_samples], optional Sample weights passed as sample_weights to each regressor in the regressors list as well as the meta_regressor. Raises error if some regressor does not support sample_weight in the fit() method. Returns self : object fit_transform(X, y=None, fit_params) Fit to data, then transform it. Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X. Parameters X : numpy array of shape [n_samples, n_features] Training set. y : numpy array of shape [n_samples] Target values. Returns X_new : numpy array of shape [n_samples, n_features_new] Transformed array. get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values. predict(X) Predict target values for X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns y_target : array-like, shape = [n_samples] or [n_samples, n_targets] Predicted target values. predict_meta_features(X) Get meta-features of test-data. Parameters X : numpy array, shape = [n_samples, n_features] Test vectors, where n_samples is the number of samples and n_features is the number of features. Returns meta-features : numpy array, shape = [n_samples, len(self.regressors)] meta-features for test data, where n_samples is the number of samples in test data and len(self.regressors) is the number of regressors. score(X, y, sample_weight=None) Returns the coefficient of determination R^2 of the prediction. The coefficient R^2 is defined as (1 - u/v), where u is the residual sum of squares ((y_true - y_pred) 2).sum() and v is the total sum of squares ((y_true - y_true.mean()) 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0. Parameters X : array-like, shape = (n_samples, n_features) Test samples. For some estimators this may be a precomputed kernel matrix instead, shape = (n_samples, n_samples_fitted], where n_samples_fitted is the number of samples used in the fitting for the estimator. y : array-like, shape = (n_samples) or (n_samples, n_outputs) True values for X. sample_weight : array-like, shape = [n_samples], optional Sample weights. Returns score : float R^2 of self.predict(X) wrt. y. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self","title":"StackingCVRegressor"},{"location":"api_modules/mlxtend.regressor/StackingCVRegressor/#stackingcvregressor","text":"StackingCVRegressor(regressors, meta_regressor, cv=5, shuffle=True, use_features_in_secondary=False, store_train_meta_features=False, refit=True) A 'Stacking Cross-Validation' regressor for scikit-learn estimators. New in mlxtend v0.7.0 Notes The StackingCVRegressor uses scikit-learn's check_cv internally, which doesn't support a random seed. Thus NumPy's random seed need to be specified explicitely for deterministic behavior, for instance, by setting np.random.seed(RANDOM_SEED) prior to fitting the StackingCVRegressor Parameters regressors : array-like, shape = [n_regressors] A list of regressors. Invoking the fit method on the StackingCVRegressor will fit clones of these original regressors that will be stored in the class attribute self.regr_ . meta_regressor : object The meta-regressor to be fitted on the ensemble of regressor cv : int, cross-validation generator or iterable, optional (default: 5) Determines the cross-validation splitting strategy. Possible inputs for cv are: - None, to use the default 5-fold cross validation, - integer, to specify the number of folds in a KFold , - An object to be used as a cross-validation generator. - An iterable yielding train, test splits. For integer/None inputs, it will use KFold cross-validation use_features_in_secondary : bool (default: False) If True, the meta-regressor will be trained both on the predictions of the original regressors and the original dataset. If False, the meta-regressor will be trained only on the predictions of the original regressors. shuffle : bool (default: True) If True, and the cv argument is integer, the training data will be shuffled at fitting stage prior to cross-validation. If the cv argument is a specific cross validation technique, this argument is omitted. store_train_meta_features : bool (default: False) If True, the meta-features computed from the training data used for fitting the meta-regressor stored in the self.train_meta_features_ array, which can be accessed after calling fit . refit : bool (default: True) Clones the regressors for stacking regression if True (default) or else uses the original ones, which will be refitted on the dataset upon calling the fit method. Setting refit=False is recommended if you are working with estimators that are supporting the scikit-learn fit/predict API interface but are not compatible to scikit-learn's clone function. Attributes train_meta_features : numpy array, shape = [n_samples, n_regressors] meta-features for training data, where n_samples is the number of samples in training data and len(self.regressors) is the number of regressors. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/regressor/StackingCVRegressor/","title":"StackingCVRegressor"},{"location":"api_modules/mlxtend.regressor/StackingCVRegressor/#methods","text":"fit(X, y, groups=None, sample_weight=None) Fit ensemble regressors and the meta-regressor. Parameters X : numpy array, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : numpy array, shape = [n_samples] Target values. groups : numpy array/None, shape = [n_samples] The group that each sample belongs to. This is used by specific folding strategies such as GroupKFold() sample_weight : array-like, shape = [n_samples], optional Sample weights passed as sample_weights to each regressor in the regressors list as well as the meta_regressor. Raises error if some regressor does not support sample_weight in the fit() method. Returns self : object fit_transform(X, y=None, fit_params) Fit to data, then transform it. Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X. Parameters X : numpy array of shape [n_samples, n_features] Training set. y : numpy array of shape [n_samples] Target values. Returns X_new : numpy array of shape [n_samples, n_features_new] Transformed array. get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values. predict(X) Predict target values for X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns y_target : array-like, shape = [n_samples] or [n_samples, n_targets] Predicted target values. predict_meta_features(X) Get meta-features of test-data. Parameters X : numpy array, shape = [n_samples, n_features] Test vectors, where n_samples is the number of samples and n_features is the number of features. Returns meta-features : numpy array, shape = [n_samples, len(self.regressors)] meta-features for test data, where n_samples is the number of samples in test data and len(self.regressors) is the number of regressors. score(X, y, sample_weight=None) Returns the coefficient of determination R^2 of the prediction. The coefficient R^2 is defined as (1 - u/v), where u is the residual sum of squares ((y_true - y_pred) 2).sum() and v is the total sum of squares ((y_true - y_true.mean()) 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0. Parameters X : array-like, shape = (n_samples, n_features) Test samples. For some estimators this may be a precomputed kernel matrix instead, shape = (n_samples, n_samples_fitted], where n_samples_fitted is the number of samples used in the fitting for the estimator. y : array-like, shape = (n_samples) or (n_samples, n_outputs) True values for X. sample_weight : array-like, shape = [n_samples], optional Sample weights. Returns score : float R^2 of self.predict(X) wrt. y. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self","title":"Methods"},{"location":"api_modules/mlxtend.regressor/StackingRegressor/","text":"StackingRegressor StackingRegressor(regressors, meta_regressor, verbose=0, use_features_in_secondary=False, store_train_meta_features=False, refit=True) A Stacking regressor for scikit-learn estimators for regression. Parameters regressors : array-like, shape = [n_regressors] A list of regressors. Invoking the fit method on the StackingRegressor will fit clones of those original regressors that will be stored in the class attribute self.regr_ . meta_regressor : object The meta-regressor to be fitted on the ensemble of regressors verbose : int, optional (default=0) Controls the verbosity of the building process. - verbose=0 (default): Prints nothing - verbose=1 : Prints the number & name of the regressor being fitted - verbose=2 : Prints info about the parameters of the regressor being fitted - verbose>2 : Changes verbose param of the underlying regressor to self.verbose - 2 use_features_in_secondary : bool (default: False) If True, the meta-regressor will be trained both on the predictions of the original regressors and the original dataset. If False, the meta-regressor will be trained only on the predictions of the original regressors. store_train_meta_features : bool (default: False) If True, the meta-features computed from the training data used for fitting the meta-regressor stored in the self.train_meta_features_ array, which can be accessed after calling fit . Attributes regr_ : list, shape=[n_regressors] Fitted regressors (clones of the original regressors) meta_regr_ : estimator Fitted meta-regressor (clone of the original meta-estimator) coef_ : array-like, shape = [n_features] Model coefficients of the fitted meta-estimator intercept_ : float Intercept of the fitted meta-estimator train_meta_features : numpy array, shape = [n_samples, len(self.regressors)] meta-features for training data, where n_samples is the number of samples in training data and len(self.regressors) is the number of regressors. refit : bool (default: True) Clones the regressors for stacking regression if True (default) or else uses the original ones, which will be refitted on the dataset upon calling the fit method. Setting refit=False is recommended if you are working with estimators that are supporting the scikit-learn fit/predict API interface but are not compatible to scikit-learn's clone function. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/regressor/StackingRegressor/ Methods fit(X, y, sample_weight=None) Learn weight coefficients from training data for each regressor. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] or [n_samples, n_targets] Target values. sample_weight : array-like, shape = [n_samples], optional Sample weights passed as sample_weights to each regressor in the regressors list as well as the meta_regressor. Raises error if some regressor does not support sample_weight in the fit() method. Returns self : object fit_transform(X, y=None, fit_params) Fit to data, then transform it. Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X. Parameters X : numpy array of shape [n_samples, n_features] Training set. y : numpy array of shape [n_samples] Target values. Returns X_new : numpy array of shape [n_samples, n_features_new] Transformed array. get_params(deep=True) Return estimator parameter names for GridSearch support. predict(X) Predict target values for X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns y_target : array-like, shape = [n_samples] or [n_samples, n_targets] Predicted target values. predict_meta_features(X) Get meta-features of test-data. Parameters X : numpy array, shape = [n_samples, n_features] Test vectors, where n_samples is the number of samples and n_features is the number of features. Returns meta-features : numpy array, shape = [n_samples, len(self.regressors)] meta-features for test data, where n_samples is the number of samples in test data and len(self.regressors) is the number of regressors. score(X, y, sample_weight=None) Returns the coefficient of determination R^2 of the prediction. The coefficient R^2 is defined as (1 - u/v), where u is the residual sum of squares ((y_true - y_pred) 2).sum() and v is the total sum of squares ((y_true - y_true.mean()) 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0. Parameters X : array-like, shape = (n_samples, n_features) Test samples. For some estimators this may be a precomputed kernel matrix instead, shape = (n_samples, n_samples_fitted], where n_samples_fitted is the number of samples used in the fitting for the estimator. y : array-like, shape = (n_samples) or (n_samples, n_outputs) True values for X. sample_weight : array-like, shape = [n_samples], optional Sample weights. Returns score : float R^2 of self.predict(X) wrt. y. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self Properties coef_ None intercept_ None","title":"StackingRegressor"},{"location":"api_modules/mlxtend.regressor/StackingRegressor/#stackingregressor","text":"StackingRegressor(regressors, meta_regressor, verbose=0, use_features_in_secondary=False, store_train_meta_features=False, refit=True) A Stacking regressor for scikit-learn estimators for regression. Parameters regressors : array-like, shape = [n_regressors] A list of regressors. Invoking the fit method on the StackingRegressor will fit clones of those original regressors that will be stored in the class attribute self.regr_ . meta_regressor : object The meta-regressor to be fitted on the ensemble of regressors verbose : int, optional (default=0) Controls the verbosity of the building process. - verbose=0 (default): Prints nothing - verbose=1 : Prints the number & name of the regressor being fitted - verbose=2 : Prints info about the parameters of the regressor being fitted - verbose>2 : Changes verbose param of the underlying regressor to self.verbose - 2 use_features_in_secondary : bool (default: False) If True, the meta-regressor will be trained both on the predictions of the original regressors and the original dataset. If False, the meta-regressor will be trained only on the predictions of the original regressors. store_train_meta_features : bool (default: False) If True, the meta-features computed from the training data used for fitting the meta-regressor stored in the self.train_meta_features_ array, which can be accessed after calling fit . Attributes regr_ : list, shape=[n_regressors] Fitted regressors (clones of the original regressors) meta_regr_ : estimator Fitted meta-regressor (clone of the original meta-estimator) coef_ : array-like, shape = [n_features] Model coefficients of the fitted meta-estimator intercept_ : float Intercept of the fitted meta-estimator train_meta_features : numpy array, shape = [n_samples, len(self.regressors)] meta-features for training data, where n_samples is the number of samples in training data and len(self.regressors) is the number of regressors. refit : bool (default: True) Clones the regressors for stacking regression if True (default) or else uses the original ones, which will be refitted on the dataset upon calling the fit method. Setting refit=False is recommended if you are working with estimators that are supporting the scikit-learn fit/predict API interface but are not compatible to scikit-learn's clone function. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/regressor/StackingRegressor/","title":"StackingRegressor"},{"location":"api_modules/mlxtend.regressor/StackingRegressor/#methods","text":"fit(X, y, sample_weight=None) Learn weight coefficients from training data for each regressor. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] or [n_samples, n_targets] Target values. sample_weight : array-like, shape = [n_samples], optional Sample weights passed as sample_weights to each regressor in the regressors list as well as the meta_regressor. Raises error if some regressor does not support sample_weight in the fit() method. Returns self : object fit_transform(X, y=None, fit_params) Fit to data, then transform it. Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X. Parameters X : numpy array of shape [n_samples, n_features] Training set. y : numpy array of shape [n_samples] Target values. Returns X_new : numpy array of shape [n_samples, n_features_new] Transformed array. get_params(deep=True) Return estimator parameter names for GridSearch support. predict(X) Predict target values for X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns y_target : array-like, shape = [n_samples] or [n_samples, n_targets] Predicted target values. predict_meta_features(X) Get meta-features of test-data. Parameters X : numpy array, shape = [n_samples, n_features] Test vectors, where n_samples is the number of samples and n_features is the number of features. Returns meta-features : numpy array, shape = [n_samples, len(self.regressors)] meta-features for test data, where n_samples is the number of samples in test data and len(self.regressors) is the number of regressors. score(X, y, sample_weight=None) Returns the coefficient of determination R^2 of the prediction. The coefficient R^2 is defined as (1 - u/v), where u is the residual sum of squares ((y_true - y_pred) 2).sum() and v is the total sum of squares ((y_true - y_true.mean()) 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0. Parameters X : array-like, shape = (n_samples, n_features) Test samples. For some estimators this may be a precomputed kernel matrix instead, shape = (n_samples, n_samples_fitted], where n_samples_fitted is the number of samples used in the fitting for the estimator. y : array-like, shape = (n_samples) or (n_samples, n_outputs) True values for X. sample_weight : array-like, shape = [n_samples], optional Sample weights. Returns score : float R^2 of self.predict(X) wrt. y. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self","title":"Methods"},{"location":"api_modules/mlxtend.regressor/StackingRegressor/#properties","text":"coef_ None intercept_ None","title":"Properties"},{"location":"api_modules/mlxtend.text/generalize_names/","text":"generalize_names generalize_names(name, output_sep=' ', firstname_output_letters=1) Generalize a person's first and last name. Returns a person's name in the format (all lowercase) Parameters name : str Name of the player output_sep : str (default: ' ') String for separating last name and first name in the output. firstname_output_letters : int Number of letters in the abbreviated first name. Returns gen_name : str The generalized name. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/text/generalize_names/","title":"Generalize names"},{"location":"api_modules/mlxtend.text/generalize_names/#generalize_names","text":"generalize_names(name, output_sep=' ', firstname_output_letters=1) Generalize a person's first and last name. Returns a person's name in the format (all lowercase) Parameters name : str Name of the player output_sep : str (default: ' ') String for separating last name and first name in the output. firstname_output_letters : int Number of letters in the abbreviated first name. Returns gen_name : str The generalized name. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/text/generalize_names/","title":"generalize_names"},{"location":"api_modules/mlxtend.text/generalize_names_duplcheck/","text":"generalize_names_duplcheck generalize_names_duplcheck(df, col_name) Generalizes names and removes duplicates. Applies mlxtend.text.generalize_names to a DataFrame with 1 first name letter by default and uses more first name letters if duplicates are detected. Parameters df : pandas.DataFrame DataFrame that contains a column where generalize_names should be applied. col_name : str Name of the DataFrame column where generalize_names function should be applied to. Returns df_new : str New DataFrame object where generalize_names function has been applied without duplicates. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/text/generalize_names_duplcheck/","title":"Generalize names duplcheck"},{"location":"api_modules/mlxtend.text/generalize_names_duplcheck/#generalize_names_duplcheck","text":"generalize_names_duplcheck(df, col_name) Generalizes names and removes duplicates. Applies mlxtend.text.generalize_names to a DataFrame with 1 first name letter by default and uses more first name letters if duplicates are detected. Parameters df : pandas.DataFrame DataFrame that contains a column where generalize_names should be applied. col_name : str Name of the DataFrame column where generalize_names function should be applied to. Returns df_new : str New DataFrame object where generalize_names function has been applied without duplicates. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/text/generalize_names_duplcheck/","title":"generalize_names_duplcheck"},{"location":"api_modules/mlxtend.text/tokenizer_emoticons/","text":"tokenizer_emoticons tokenizer_emoticons(text) Return emoticons from text Examples >>> tokenizer_emoticons('This :) is :( a test :-)!') [':)', ':(', ':-)'] For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/text/tokenizer_emoticons/","title":"Tokenizer emoticons"},{"location":"api_modules/mlxtend.text/tokenizer_emoticons/#tokenizer_emoticons","text":"tokenizer_emoticons(text) Return emoticons from text Examples >>> tokenizer_emoticons('This :) is :( a test :-)!') [':)', ':(', ':-)'] For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/text/tokenizer_emoticons/","title":"tokenizer_emoticons"},{"location":"api_modules/mlxtend.text/tokenizer_words_and_emoticons/","text":"tokenizer_words_and_emoticons tokenizer_words_and_emoticons(text) Convert text to lowercase words and emoticons. Examples >>> tokenizer_words_and_emoticons('This :) is :( a test :-)!') ['this', 'is', 'a', 'test', ':)', ':(', ':-)'] For more usage examples, please see http://rasbt.github.io/mlxtend/user_guide/text/tokenizer_words_and_emoticons/","title":"Tokenizer words and emoticons"},{"location":"api_modules/mlxtend.text/tokenizer_words_and_emoticons/#tokenizer_words_and_emoticons","text":"tokenizer_words_and_emoticons(text) Convert text to lowercase words and emoticons. Examples >>> tokenizer_words_and_emoticons('This :) is :( a test :-)!') ['this', 'is', 'a', 'test', ':)', ':(', ':-)'] For more usage examples, please see http://rasbt.github.io/mlxtend/user_guide/text/tokenizer_words_and_emoticons/","title":"tokenizer_words_and_emoticons"},{"location":"api_modules/mlxtend.utils/Counter/","text":"Counter Counter(stderr=False, start_newline=True, precision=0, name=None) Class to display the progress of for-loop iterators. Parameters stderr : bool (default: True) Prints output to sys.stderr if True; uses sys.stdout otherwise. start_newline : bool (default: True) Prepends a new line to the counter, which prevents overwriting counters if multiple counters are printed in succession. precision: int (default: 0) Sets the number of decimal places when displaying the time elapsed in seconds. name : string (default: None) Prepends the specified name before the counter to allow distinguishing between multiple counters. Attributes curr_iter : int The current iteration. start_time : float The system's time in seconds when the Counter was initialized. end_time : float The system's time in seconds when the Counter was last updated. Examples >>> cnt = Counter() >>> for i in range(20): ... # do some computation ... time.sleep(0.1) ... cnt.update() 20 iter | 2 sec >>> print('The counter was initialized.' ' %d seconds ago.' % (time.time() - cnt.start_time)) The counter was initialized 2 seconds ago >>> print('The counter was last updated' ' %d seconds ago.' % (time.time() - cnt.end_time)) The counter was last updated 0 seconds ago. For more usage examples, please see http://rasbt.github.io/mlxtend/user_guide/utils/Counter/ Methods update() Print current iteration and time elapsed.","title":"Counter"},{"location":"api_modules/mlxtend.utils/Counter/#counter","text":"Counter(stderr=False, start_newline=True, precision=0, name=None) Class to display the progress of for-loop iterators. Parameters stderr : bool (default: True) Prints output to sys.stderr if True; uses sys.stdout otherwise. start_newline : bool (default: True) Prepends a new line to the counter, which prevents overwriting counters if multiple counters are printed in succession. precision: int (default: 0) Sets the number of decimal places when displaying the time elapsed in seconds. name : string (default: None) Prepends the specified name before the counter to allow distinguishing between multiple counters. Attributes curr_iter : int The current iteration. start_time : float The system's time in seconds when the Counter was initialized. end_time : float The system's time in seconds when the Counter was last updated. Examples >>> cnt = Counter() >>> for i in range(20): ... # do some computation ... time.sleep(0.1) ... cnt.update() 20 iter | 2 sec >>> print('The counter was initialized.' ' %d seconds ago.' % (time.time() - cnt.start_time)) The counter was initialized 2 seconds ago >>> print('The counter was last updated' ' %d seconds ago.' % (time.time() - cnt.end_time)) The counter was last updated 0 seconds ago. For more usage examples, please see http://rasbt.github.io/mlxtend/user_guide/utils/Counter/","title":"Counter"},{"location":"api_modules/mlxtend.utils/Counter/#methods","text":"update() Print current iteration and time elapsed.","title":"Methods"},{"location":"api_modules/mlxtend.utils/assert_raises/","text":"assert_raises assert_raises(exception_type, message, func, args, * kwargs) Check that an exception is raised with a specific message Parameters exception_type : exception The exception that should be raised message : str (default: None) The error message that should be raised. Ignored if False or None. func : callable The function that raises the exception *args : positional arguments to func . **kwargs : keyword arguments to func","title":"Assert raises"},{"location":"api_modules/mlxtend.utils/assert_raises/#assert_raises","text":"assert_raises(exception_type, message, func, args, * kwargs) Check that an exception is raised with a specific message Parameters exception_type : exception The exception that should be raised message : str (default: None) The error message that should be raised. Ignored if False or None. func : callable The function that raises the exception *args : positional arguments to func . **kwargs : keyword arguments to func","title":"assert_raises"},{"location":"api_modules/mlxtend.utils/check_Xy/","text":"check_Xy check_Xy(X, y, y_int=True) None","title":"check Xy"},{"location":"api_modules/mlxtend.utils/check_Xy/#check_xy","text":"check_Xy(X, y, y_int=True) None","title":"check_Xy"},{"location":"api_modules/mlxtend.utils/format_kwarg_dictionaries/","text":"format_kwarg_dictionaries format_kwarg_dictionaries(default_kwargs=None, user_kwargs=None, protected_keys=None) Function to combine default and user specified kwargs dictionaries Parameters default_kwargs : dict, optional Default kwargs (default is None). user_kwargs : dict, optional User specified kwargs (default is None). protected_keys : array_like, optional Sequence of keys to be removed from the returned dictionary (default is None). Returns formatted_kwargs : dict Formatted kwargs dictionary.","title":"Format kwarg dictionaries"},{"location":"api_modules/mlxtend.utils/format_kwarg_dictionaries/#format_kwarg_dictionaries","text":"format_kwarg_dictionaries(default_kwargs=None, user_kwargs=None, protected_keys=None) Function to combine default and user specified kwargs dictionaries Parameters default_kwargs : dict, optional Default kwargs (default is None). user_kwargs : dict, optional User specified kwargs (default is None). protected_keys : array_like, optional Sequence of keys to be removed from the returned dictionary (default is None). Returns formatted_kwargs : dict Formatted kwargs dictionary.","title":"format_kwarg_dictionaries"},{"location":"api_subpackages/mlxtend._base/","text":"mlxtend version: 0.14.0dev","title":"Mlxtend. base"},{"location":"api_subpackages/mlxtend.classifier/","text":"mlxtend version: 0.14.0dev Adaline Adaline(eta=0.01, epochs=50, minibatches=None, random_seed=None, print_progress=0) ADAptive LInear NEuron classifier. Note that this implementation of Adaline expects binary class labels in {0, 1}. Parameters eta : float (default: 0.01) solver rate (between 0.0 and 1.0) epochs : int (default: 50) Passes over the training dataset. Prior to each epoch, the dataset is shuffled if minibatches > 1 to prevent cycles in stochastic gradient descent. minibatches : int (default: None) The number of minibatches for gradient-based optimization. If None: Normal Equations (closed-form solution) If 1: Gradient Descent learning If len(y): Stochastic Gradient Descent (SGD) online learning If 1 < minibatches < len(y): SGD Minibatch learning random_seed : int (default: None) Set random state for shuffling and initializing the weights. print_progress : int (default: 0) Prints progress in fitting to stderr if not solver='normal equation' 0: No output 1: Epochs elapsed and cost 2: 1 plus time elapsed 3: 2 plus estimated time until completion Attributes w_ : 2d-array, shape={n_features, 1} Model weights after fitting. b_ : 1d-array, shape={1,} Bias unit after fitting. cost_ : list Sum of squared errors after each epoch. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/classifier/Adaline/ Methods fit(X, y, init_params=True) Learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. init_params : bool (default: True) Re-initializes model parameters prior to fitting. Set False to continue training with weights from a previous model fitting. Returns self : object get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values.' adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py Author: Gael Varoquaux gael.varoquaux@normalesup.org License: BSD 3 clause predict(X) Predict targets from X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns target_values : array-like, shape = [n_samples] Predicted target values. score(X, y) Compute the prediction accuracy Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values (true class labels). Returns acc : float The prediction accuracy as a float between 0.0 and 1.0 (perfect score). set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py Author: Gael Varoquaux gael.varoquaux@normalesup.org License: BSD 3 clause EnsembleVoteClassifier EnsembleVoteClassifier(clfs, voting='hard', weights=None, verbose=0, refit=True) Soft Voting/Majority Rule classifier for scikit-learn estimators. Parameters clfs : array-like, shape = [n_classifiers] A list of classifiers. Invoking the fit method on the VotingClassifier will fit clones of those original classifiers that will be stored in the class attribute self.clfs_ if refit=True (default). voting : str, {'hard', 'soft'} (default='hard') If 'hard', uses predicted class labels for majority rule voting. Else if 'soft', predicts the class label based on the argmax of the sums of the predicted probalities, which is recommended for an ensemble of well-calibrated classifiers. weights : array-like, shape = [n_classifiers], optional (default= None ) Sequence of weights ( float or int ) to weight the occurances of predicted class labels ( hard voting) or class probabilities before averaging ( soft voting). Uses uniform weights if None . verbose : int, optional (default=0) Controls the verbosity of the building process. - verbose=0 (default): Prints nothing - verbose=1 : Prints the number & name of the clf being fitted - verbose=2 : Prints info about the parameters of the clf being fitted - verbose>2 : Changes verbose param of the underlying clf to self.verbose - 2 refit : bool (default: True) Refits classifiers in clfs if True; uses references to the clfs , otherwise (assumes that the classifiers were already fit). Note: refit=False is incompatible to mist scikit-learn wrappers! For instance, if any form of cross-validation is performed this would require the re-fitting classifiers to training folds, which would raise a NotFitterError if refit=False. (New in mlxtend v0.6.) Attributes classes_ : array-like, shape = [n_predictions] clf : array-like, shape = [n_predictions] The unmodified input classifiers clf_ : array-like, shape = [n_predictions] Fitted clones of the input classifiers Examples >>> import numpy as np >>> from sklearn.linear_model import LogisticRegression >>> from sklearn.naive_bayes import GaussianNB >>> from sklearn.ensemble import RandomForestClassifier >>> from mlxtend.sklearn import EnsembleVoteClassifier >>> clf1 = LogisticRegression(random_seed=1) >>> clf2 = RandomForestClassifier(random_seed=1) >>> clf3 = GaussianNB() >>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]]) >>> y = np.array([1, 1, 1, 2, 2, 2]) >>> eclf1 = EnsembleVoteClassifier(clfs=[clf1, clf2, clf3], ... voting='hard', verbose=1) >>> eclf1 = eclf1.fit(X, y) >>> print(eclf1.predict(X)) [1 1 1 2 2 2] >>> eclf2 = EnsembleVoteClassifier(clfs=[clf1, clf2, clf3], voting='soft') >>> eclf2 = eclf2.fit(X, y) >>> print(eclf2.predict(X)) [1 1 1 2 2 2] >>> eclf3 = EnsembleVoteClassifier(clfs=[clf1, clf2, clf3], ... voting='soft', weights=[2,1,1]) >>> eclf3 = eclf3.fit(X, y) >>> print(eclf3.predict(X)) [1 1 1 2 2 2] >>> For more usage examples, please see http://rasbt.github.io/mlxtend/user_guide/classifier/EnsembleVoteClassifier/ Methods fit(X, y, sample_weight=None) Learn weight coefficients from training data for each classifier. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. sample_weight : array-like, shape = [n_samples], optional Sample weights passed as sample_weights to each regressor in the regressors list as well as the meta_regressor. Raises error if some regressor does not support sample_weight in the fit() method. Returns self : object fit_transform(X, y=None, fit_params) Fit to data, then transform it. Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X. Parameters X : numpy array of shape [n_samples, n_features] Training set. y : numpy array of shape [n_samples] Target values. Returns X_new : numpy array of shape [n_samples, n_features_new] Transformed array. get_params(deep=True) Return estimator parameter names for GridSearch support. predict(X) Predict class labels for X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns maj : array-like, shape = [n_samples] Predicted class labels. predict_proba(X) Predict class probabilities for X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns avg : array-like, shape = [n_samples, n_classes] Weighted average probability for each class per sample. score(X, y, sample_weight=None) Returns the mean accuracy on the given test data and labels. In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted. Parameters X : array-like, shape = (n_samples, n_features) Test samples. y : array-like, shape = (n_samples) or (n_samples, n_outputs) True labels for X. sample_weight : array-like, shape = [n_samples], optional Sample weights. Returns score : float Mean accuracy of self.predict(X) wrt. y. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self transform(X) Return class labels or probabilities for X for each estimator. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns If voting='soft'`` : array-like = [n_classifiers, n_samples, n_classes] Class probabilties calculated by each classifier. If voting='hard'`` : array-like = [n_classifiers, n_samples] Class labels predicted by each classifier. LogisticRegression LogisticRegression(eta=0.01, epochs=50, l2_lambda=0.0, minibatches=1, random_seed=None, print_progress=0) Logistic regression classifier. Note that this implementation of Logistic Regression expects binary class labels in {0, 1}. Parameters eta : float (default: 0.01) Learning rate (between 0.0 and 1.0) epochs : int (default: 50) Passes over the training dataset. Prior to each epoch, the dataset is shuffled if minibatches > 1 to prevent cycles in stochastic gradient descent. l2_lambda : float Regularization parameter for L2 regularization. No regularization if l2_lambda=0.0. minibatches : int (default: 1) The number of minibatches for gradient-based optimization. If 1: Gradient Descent learning If len(y): Stochastic Gradient Descent (SGD) online learning If 1 < minibatches < len(y): SGD Minibatch learning random_seed : int (default: None) Set random state for shuffling and initializing the weights. print_progress : int (default: 0) Prints progress in fitting to stderr. 0: No output 1: Epochs elapsed and cost 2: 1 plus time elapsed 3: 2 plus estimated time until completion Attributes w_ : 2d-array, shape={n_features, 1} Model weights after fitting. b_ : 1d-array, shape={1,} Bias unit after fitting. cost_ : list List of floats with cross_entropy cost (sgd or gd) for every epoch. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/classifier/LogisticRegression/ Methods fit(X, y, init_params=True) Learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. init_params : bool (default: True) Re-initializes model parameters prior to fitting. Set False to continue training with weights from a previous model fitting. Returns self : object get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values.' adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py Author: Gael Varoquaux gael.varoquaux@normalesup.org License: BSD 3 clause predict(X) Predict targets from X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns target_values : array-like, shape = [n_samples] Predicted target values. predict_proba(X) Predict class probabilities of X from the net input. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns Class 1 probability : float score(X, y) Compute the prediction accuracy Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values (true class labels). Returns acc : float The prediction accuracy as a float between 0.0 and 1.0 (perfect score). set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py Author: Gael Varoquaux gael.varoquaux@normalesup.org License: BSD 3 clause MultiLayerPerceptron MultiLayerPerceptron(eta=0.5, epochs=50, hidden_layers=[50], n_classes=None, momentum=0.0, l1=0.0, l2=0.0, dropout=1.0, decrease_const=0.0, minibatches=1, random_seed=None, print_progress=0) Multi-layer perceptron classifier with logistic sigmoid activations Parameters eta : float (default: 0.5) Learning rate (between 0.0 and 1.0) epochs : int (default: 50) Passes over the training dataset. Prior to each epoch, the dataset is shuffled if minibatches > 1 to prevent cycles in stochastic gradient descent. hidden_layers : list (default: [50]) Number of units per hidden layer. By default 50 units in the first hidden layer. At the moment only 1 hidden layer is supported n_classes : int (default: None) A positive integer to declare the number of class labels if not all class labels are present in a partial training set. Gets the number of class labels automatically if None. l1 : float (default: 0.0) L1 regularization strength l2 : float (default: 0.0) L2 regularization strength momentum : float (default: 0.0) Momentum constant. Factor multiplied with the gradient of the previous epoch t-1 to improve learning speed w(t) := w(t) - (grad(t) + momentum * grad(t-1)) decrease_const : float (default: 0.0) Decrease constant. Shrinks the learning rate after each epoch via eta / (1 + epoch*decrease_const) minibatches : int (default: 1) Divide the training data into k minibatches for accelerated stochastic gradient descent learning. Gradient Descent Learning if minibatches = 1 Stochastic Gradient Descent learning if minibatches = len(y) Minibatch learning if minibatches > 1 random_seed : int (default: None) Set random state for shuffling and initializing the weights. print_progress : int (default: 0) Prints progress in fitting to stderr. 0: No output 1: Epochs elapsed and cost 2: 1 plus time elapsed 3: 2 plus estimated time until completion Attributes w_ : 2d-array, shape=[n_features, n_classes] Weights after fitting. b_ : 1D-array, shape=[n_classes] Bias units after fitting. cost_ : list List of floats; the mean categorical cross entropy cost after each epoch. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/classifier/MultiLayerPerceptron/ Methods fit(X, y, init_params=True) Learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. init_params : bool (default: True) Re-initializes model parameters prior to fitting. Set False to continue training with weights from a previous model fitting. Returns self : object get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values.' adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py Author: Gael Varoquaux gael.varoquaux@normalesup.org License: BSD 3 clause predict(X) Predict targets from X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns target_values : array-like, shape = [n_samples] Predicted target values. predict_proba(X) Predict class probabilities of X from the net input. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns Class probabilties : array-like, shape= [n_samples, n_classes] score(X, y) Compute the prediction accuracy Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values (true class labels). Returns acc : float The prediction accuracy as a float between 0.0 and 1.0 (perfect score). set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py Author: Gael Varoquaux gael.varoquaux@normalesup.org License: BSD 3 clause Perceptron Perceptron(eta=0.1, epochs=50, random_seed=None, print_progress=0) Perceptron classifier. Note that this implementation of the Perceptron expects binary class labels in {0, 1}. Parameters eta : float (default: 0.1) Learning rate (between 0.0 and 1.0) epochs : int (default: 50) Number of passes over the training dataset. Prior to each epoch, the dataset is shuffled to prevent cycles. random_seed : int Random state for initializing random weights and shuffling. print_progress : int (default: 0) Prints progress in fitting to stderr. 0: No output 1: Epochs elapsed and cost 2: 1 plus time elapsed 3: 2 plus estimated time until completion Attributes w_ : 2d-array, shape={n_features, 1} Model weights after fitting. b_ : 1d-array, shape={1,} Bias unit after fitting. cost_ : list Number of misclassifications in every epoch. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/classifier/Perceptron/ Methods fit(X, y, init_params=True) Learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. init_params : bool (default: True) Re-initializes model parameters prior to fitting. Set False to continue training with weights from a previous model fitting. Returns self : object get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values.' adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py Author: Gael Varoquaux gael.varoquaux@normalesup.org License: BSD 3 clause predict(X) Predict targets from X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns target_values : array-like, shape = [n_samples] Predicted target values. score(X, y) Compute the prediction accuracy Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values (true class labels). Returns acc : float The prediction accuracy as a float between 0.0 and 1.0 (perfect score). set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py Author: Gael Varoquaux gael.varoquaux@normalesup.org License: BSD 3 clause SoftmaxRegression SoftmaxRegression(eta=0.01, epochs=50, l2=0.0, minibatches=1, n_classes=None, random_seed=None, print_progress=0) Softmax regression classifier. Parameters eta : float (default: 0.01) Learning rate (between 0.0 and 1.0) epochs : int (default: 50) Passes over the training dataset. Prior to each epoch, the dataset is shuffled if minibatches > 1 to prevent cycles in stochastic gradient descent. l2 : float Regularization parameter for L2 regularization. No regularization if l2=0.0. minibatches : int (default: 1) The number of minibatches for gradient-based optimization. If 1: Gradient Descent learning If len(y): Stochastic Gradient Descent (SGD) online learning If 1 < minibatches < len(y): SGD Minibatch learning n_classes : int (default: None) A positive integer to declare the number of class labels if not all class labels are present in a partial training set. Gets the number of class labels automatically if None. random_seed : int (default: None) Set random state for shuffling and initializing the weights. print_progress : int (default: 0) Prints progress in fitting to stderr. 0: No output 1: Epochs elapsed and cost 2: 1 plus time elapsed 3: 2 plus estimated time until completion Attributes w_ : 2d-array, shape={n_features, 1} Model weights after fitting. b_ : 1d-array, shape={1,} Bias unit after fitting. cost_ : list List of floats, the average cross_entropy for each epoch. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/classifier/SoftmaxRegression/ Methods fit(X, y, init_params=True) Learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. init_params : bool (default: True) Re-initializes model parameters prior to fitting. Set False to continue training with weights from a previous model fitting. Returns self : object get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values.' adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py Author: Gael Varoquaux gael.varoquaux@normalesup.org License: BSD 3 clause predict(X) Predict targets from X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns target_values : array-like, shape = [n_samples] Predicted target values. predict_proba(X) Predict class probabilities of X from the net input. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns Class probabilties : array-like, shape= [n_samples, n_classes] score(X, y) Compute the prediction accuracy Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values (true class labels). Returns acc : float The prediction accuracy as a float between 0.0 and 1.0 (perfect score). set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py Author: Gael Varoquaux gael.varoquaux@normalesup.org License: BSD 3 clause StackingCVClassifier StackingCVClassifier(classifiers, meta_classifier, use_probas=False, cv=2, use_features_in_secondary=False, stratify=True, shuffle=True, verbose=0, store_train_meta_features=False, use_clones=True) A 'Stacking Cross-Validation' classifier for scikit-learn estimators. New in mlxtend v0.4.3 Notes The StackingCVClassifier uses scikit-learn's check_cv internally, which doesn't support a random seed. Thus NumPy's random seed need to be specified explicitely for deterministic behavior, for instance, by setting np.random.seed(RANDOM_SEED) prior to fitting the StackingCVClassifier Parameters classifiers : array-like, shape = [n_classifiers] A list of classifiers. Invoking the fit method on the StackingCVClassifer will fit clones of these original classifiers that will be stored in the class attribute self.clfs_ . meta_classifier : object The meta-classifier to be fitted on the ensemble of classifiers use_probas : bool (default: False) If True, trains meta-classifier based on predicted probabilities instead of class labels. cv : int, cross-validation generator or an iterable, optional (default: 2) Determines the cross-validation splitting strategy. Possible inputs for cv are: - None, to use the default 2-fold cross validation, - integer, to specify the number of folds in a (Stratified)KFold , - An object to be used as a cross-validation generator. - An iterable yielding train, test splits. For integer/None inputs, it will use either a KFold or StratifiedKFold cross validation depending the value of stratify argument. use_features_in_secondary : bool (default: False) If True, the meta-classifier will be trained both on the predictions of the original classifiers and the original dataset. If False, the meta-classifier will be trained only on the predictions of the original classifiers. stratify : bool (default: True) If True, and the cv argument is integer it will follow a stratified K-Fold cross validation technique. If the cv argument is a specific cross validation technique, this argument is omitted. shuffle : bool (default: True) If True, and the cv argument is integer, the training data will be shuffled at fitting stage prior to cross-validation. If the cv argument is a specific cross validation technique, this argument is omitted. verbose : int, optional (default=0) Controls the verbosity of the building process. - verbose=0 (default): Prints nothing - verbose=1 : Prints the number & name of the regressor being fitted and which fold is currently being used for fitting - verbose=2 : Prints info about the parameters of the regressor being fitted - verbose>2 : Changes verbose param of the underlying regressor to self.verbose - 2 store_train_meta_features : bool (default: False) If True, the meta-features computed from the training data used for fitting the meta-classifier stored in the self.train_meta_features_ array, which can be accessed after calling fit . use_clones : bool (default: True) Clones the classifiers for stacking classification if True (default) or else uses the original ones, which will be refitted on the dataset upon calling the fit method. Hence, if use_clones=True, the original input classifiers will remain unmodified upon using the StackingCVClassifier's fit method. Setting use_clones=False is recommended if you are working with estimators that are supporting the scikit-learn fit/predict API interface but are not compatible to scikit-learn's clone function. Attributes clfs_ : list, shape=[n_classifiers] Fitted classifiers (clones of the original classifiers) meta_clf_ : estimator Fitted meta-classifier (clone of the original meta-estimator) train_meta_features : numpy array, shape = [n_samples, n_classifiers] meta-features for training data, where n_samples is the number of samples in training data and n_classifiers is the number of classfiers. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/classifier/StackingCVClassifier/ Methods fit(X, y, groups=None, sample_weight=None) Fit ensemble classifers and the meta-classifier. Parameters X : numpy array, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : numpy array, shape = [n_samples] Target values. groups : numpy array/None, shape = [n_samples] The group that each sample belongs to. This is used by specific folding strategies such as GroupKFold() sample_weight : array-like, shape = [n_samples], optional Sample weights passed as sample_weights to each regressor in the regressors list as well as the meta_regressor. Raises error if some regressor does not support sample_weight in the fit() method. Returns self : object fit_transform(X, y=None, fit_params) Fit to data, then transform it. Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X. Parameters X : numpy array of shape [n_samples, n_features] Training set. y : numpy array of shape [n_samples] Target values. Returns X_new : numpy array of shape [n_samples, n_features_new] Transformed array. get_params(deep=True) Return estimator parameter names for GridSearch support. predict(X) Predict target values for X. Parameters X : numpy array, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns labels : array-like, shape = [n_samples] Predicted class labels. predict_meta_features(X) Get meta-features of test-data. Parameters X : numpy array, shape = [n_samples, n_features] Test vectors, where n_samples is the number of samples and n_features is the number of features. Returns meta-features : numpy array, shape = [n_samples, n_classifiers] Returns the meta-features for test data. predict_proba(X) Predict class probabilities for X. Parameters X : numpy array, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns proba : array-like, shape = [n_samples, n_classes] Probability for each class per sample. score(X, y, sample_weight=None) Returns the mean accuracy on the given test data and labels. In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted. Parameters X : array-like, shape = (n_samples, n_features) Test samples. y : array-like, shape = (n_samples) or (n_samples, n_outputs) True labels for X. sample_weight : array-like, shape = [n_samples], optional Sample weights. Returns score : float Mean accuracy of self.predict(X) wrt. y. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self StackingClassifier StackingClassifier(classifiers, meta_classifier, use_probas=False, average_probas=False, verbose=0, use_features_in_secondary=False, store_train_meta_features=False, use_clones=True) A Stacking classifier for scikit-learn estimators for classification. Parameters classifiers : array-like, shape = [n_classifiers] A list of classifiers. Invoking the fit method on the StackingClassifer will fit clones of these original classifiers that will be stored in the class attribute self.clfs_ . meta_classifier : object The meta-classifier to be fitted on the ensemble of classifiers use_probas : bool (default: False) If True, trains meta-classifier based on predicted probabilities instead of class labels. average_probas : bool (default: False) Averages the probabilities as meta features if True. verbose : int, optional (default=0) Controls the verbosity of the building process. - verbose=0 (default): Prints nothing - verbose=1 : Prints the number & name of the regressor being fitted - verbose=2 : Prints info about the parameters of the regressor being fitted - verbose>2 : Changes verbose param of the underlying regressor to self.verbose - 2 use_features_in_secondary : bool (default: False) If True, the meta-classifier will be trained both on the predictions of the original classifiers and the original dataset. If False, the meta-classifier will be trained only on the predictions of the original classifiers. store_train_meta_features : bool (default: False) If True, the meta-features computed from the training data used for fitting the meta-classifier stored in the self.train_meta_features_ array, which can be accessed after calling fit . use_clones : bool (default: True) Clones the classifiers for stacking classification if True (default) or else uses the original ones, which will be refitted on the dataset upon calling the fit method. Hence, if use_clones=True, the original input classifiers will remain unmodified upon using the StackingClassifier's fit method. Setting use_clones=False is recommended if you are working with estimators that are supporting the scikit-learn fit/predict API interface but are not compatible to scikit-learn's clone function. Attributes clfs_ : list, shape=[n_classifiers] Fitted classifiers (clones of the original classifiers) meta_clf_ : estimator Fitted meta-classifier (clone of the original meta-estimator) train_meta_features : numpy array, shape = [n_samples, n_classifiers] meta-features for training data, where n_samples is the number of samples in training data and n_classifiers is the number of classfiers. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/classifier/StackingClassifier/ Methods fit(X, y, sample_weight=None) Fit ensemble classifers and the meta-classifier. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] or [n_samples, n_outputs] Target values. sample_weight : array-like, shape = [n_samples], optional Sample weights passed as sample_weights to each regressor in the regressors list as well as the meta_regressor. Raises error if some regressor does not support sample_weight in the fit() method. Returns self : object fit_transform(X, y=None, fit_params) Fit to data, then transform it. Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X. Parameters X : numpy array of shape [n_samples, n_features] Training set. y : numpy array of shape [n_samples] Target values. Returns X_new : numpy array of shape [n_samples, n_features_new] Transformed array. get_params(deep=True) Return estimator parameter names for GridSearch support. predict(X) Predict target values for X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns labels : array-like, shape = [n_samples] or [n_samples, n_outputs] Predicted class labels. predict_meta_features(X) Get meta-features of test-data. Parameters X : numpy array, shape = [n_samples, n_features] Test vectors, where n_samples is the number of samples and n_features is the number of features. Returns meta-features : numpy array, shape = [n_samples, n_classifiers] Returns the meta-features for test data. predict_proba(X) Predict class probabilities for X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns proba : array-like, shape = [n_samples, n_classes] or a list of n_outputs of such arrays if n_outputs > 1. Probability for each class per sample. score(X, y, sample_weight=None) Returns the mean accuracy on the given test data and labels. In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted. Parameters X : array-like, shape = (n_samples, n_features) Test samples. y : array-like, shape = (n_samples) or (n_samples, n_outputs) True labels for X. sample_weight : array-like, shape = [n_samples], optional Sample weights. Returns score : float Mean accuracy of self.predict(X) wrt. y. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self","title":"Mlxtend.classifier"},{"location":"api_subpackages/mlxtend.classifier/#adaline","text":"Adaline(eta=0.01, epochs=50, minibatches=None, random_seed=None, print_progress=0) ADAptive LInear NEuron classifier. Note that this implementation of Adaline expects binary class labels in {0, 1}. Parameters eta : float (default: 0.01) solver rate (between 0.0 and 1.0) epochs : int (default: 50) Passes over the training dataset. Prior to each epoch, the dataset is shuffled if minibatches > 1 to prevent cycles in stochastic gradient descent. minibatches : int (default: None) The number of minibatches for gradient-based optimization. If None: Normal Equations (closed-form solution) If 1: Gradient Descent learning If len(y): Stochastic Gradient Descent (SGD) online learning If 1 < minibatches < len(y): SGD Minibatch learning random_seed : int (default: None) Set random state for shuffling and initializing the weights. print_progress : int (default: 0) Prints progress in fitting to stderr if not solver='normal equation' 0: No output 1: Epochs elapsed and cost 2: 1 plus time elapsed 3: 2 plus estimated time until completion Attributes w_ : 2d-array, shape={n_features, 1} Model weights after fitting. b_ : 1d-array, shape={1,} Bias unit after fitting. cost_ : list Sum of squared errors after each epoch. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/classifier/Adaline/","title":"Adaline"},{"location":"api_subpackages/mlxtend.classifier/#methods","text":"fit(X, y, init_params=True) Learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. init_params : bool (default: True) Re-initializes model parameters prior to fitting. Set False to continue training with weights from a previous model fitting. Returns self : object get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values.' adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py","title":"Methods"},{"location":"api_subpackages/mlxtend.classifier/#author-gael-varoquaux-amp103amp97amp101amp108amp46amp118amp97amp114amp111amp113amp117amp97amp117amp120amp64amp110amp111amp114amp109amp97amp108amp101amp115amp117amp112amp46amp111amp114amp103","text":"","title":"Author: Gael Varoquaux gael.varoquaux@normalesup.org"},{"location":"api_subpackages/mlxtend.classifier/#license-bsd-3-clause","text":"predict(X) Predict targets from X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns target_values : array-like, shape = [n_samples] Predicted target values. score(X, y) Compute the prediction accuracy Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values (true class labels). Returns acc : float The prediction accuracy as a float between 0.0 and 1.0 (perfect score). set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py","title":"License: BSD 3 clause"},{"location":"api_subpackages/mlxtend.classifier/#author-gael-varoquaux-amp103amp97amp101amp108amp46amp118amp97amp114amp111amp113amp117amp97amp117amp120amp64amp110amp111amp114amp109amp97amp108amp101amp115amp117amp112amp46amp111amp114amp103_1","text":"","title":"Author: Gael Varoquaux gael.varoquaux@normalesup.org"},{"location":"api_subpackages/mlxtend.classifier/#license-bsd-3-clause_1","text":"","title":"License: BSD 3 clause"},{"location":"api_subpackages/mlxtend.classifier/#ensemblevoteclassifier","text":"EnsembleVoteClassifier(clfs, voting='hard', weights=None, verbose=0, refit=True) Soft Voting/Majority Rule classifier for scikit-learn estimators. Parameters clfs : array-like, shape = [n_classifiers] A list of classifiers. Invoking the fit method on the VotingClassifier will fit clones of those original classifiers that will be stored in the class attribute self.clfs_ if refit=True (default). voting : str, {'hard', 'soft'} (default='hard') If 'hard', uses predicted class labels for majority rule voting. Else if 'soft', predicts the class label based on the argmax of the sums of the predicted probalities, which is recommended for an ensemble of well-calibrated classifiers. weights : array-like, shape = [n_classifiers], optional (default= None ) Sequence of weights ( float or int ) to weight the occurances of predicted class labels ( hard voting) or class probabilities before averaging ( soft voting). Uses uniform weights if None . verbose : int, optional (default=0) Controls the verbosity of the building process. - verbose=0 (default): Prints nothing - verbose=1 : Prints the number & name of the clf being fitted - verbose=2 : Prints info about the parameters of the clf being fitted - verbose>2 : Changes verbose param of the underlying clf to self.verbose - 2 refit : bool (default: True) Refits classifiers in clfs if True; uses references to the clfs , otherwise (assumes that the classifiers were already fit). Note: refit=False is incompatible to mist scikit-learn wrappers! For instance, if any form of cross-validation is performed this would require the re-fitting classifiers to training folds, which would raise a NotFitterError if refit=False. (New in mlxtend v0.6.) Attributes classes_ : array-like, shape = [n_predictions] clf : array-like, shape = [n_predictions] The unmodified input classifiers clf_ : array-like, shape = [n_predictions] Fitted clones of the input classifiers Examples >>> import numpy as np >>> from sklearn.linear_model import LogisticRegression >>> from sklearn.naive_bayes import GaussianNB >>> from sklearn.ensemble import RandomForestClassifier >>> from mlxtend.sklearn import EnsembleVoteClassifier >>> clf1 = LogisticRegression(random_seed=1) >>> clf2 = RandomForestClassifier(random_seed=1) >>> clf3 = GaussianNB() >>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]]) >>> y = np.array([1, 1, 1, 2, 2, 2]) >>> eclf1 = EnsembleVoteClassifier(clfs=[clf1, clf2, clf3], ... voting='hard', verbose=1) >>> eclf1 = eclf1.fit(X, y) >>> print(eclf1.predict(X)) [1 1 1 2 2 2] >>> eclf2 = EnsembleVoteClassifier(clfs=[clf1, clf2, clf3], voting='soft') >>> eclf2 = eclf2.fit(X, y) >>> print(eclf2.predict(X)) [1 1 1 2 2 2] >>> eclf3 = EnsembleVoteClassifier(clfs=[clf1, clf2, clf3], ... voting='soft', weights=[2,1,1]) >>> eclf3 = eclf3.fit(X, y) >>> print(eclf3.predict(X)) [1 1 1 2 2 2] >>> For more usage examples, please see http://rasbt.github.io/mlxtend/user_guide/classifier/EnsembleVoteClassifier/","title":"EnsembleVoteClassifier"},{"location":"api_subpackages/mlxtend.classifier/#methods_1","text":"fit(X, y, sample_weight=None) Learn weight coefficients from training data for each classifier. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. sample_weight : array-like, shape = [n_samples], optional Sample weights passed as sample_weights to each regressor in the regressors list as well as the meta_regressor. Raises error if some regressor does not support sample_weight in the fit() method. Returns self : object fit_transform(X, y=None, fit_params) Fit to data, then transform it. Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X. Parameters X : numpy array of shape [n_samples, n_features] Training set. y : numpy array of shape [n_samples] Target values. Returns X_new : numpy array of shape [n_samples, n_features_new] Transformed array. get_params(deep=True) Return estimator parameter names for GridSearch support. predict(X) Predict class labels for X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns maj : array-like, shape = [n_samples] Predicted class labels. predict_proba(X) Predict class probabilities for X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns avg : array-like, shape = [n_samples, n_classes] Weighted average probability for each class per sample. score(X, y, sample_weight=None) Returns the mean accuracy on the given test data and labels. In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted. Parameters X : array-like, shape = (n_samples, n_features) Test samples. y : array-like, shape = (n_samples) or (n_samples, n_outputs) True labels for X. sample_weight : array-like, shape = [n_samples], optional Sample weights. Returns score : float Mean accuracy of self.predict(X) wrt. y. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self transform(X) Return class labels or probabilities for X for each estimator. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns If voting='soft'`` : array-like = [n_classifiers, n_samples, n_classes] Class probabilties calculated by each classifier. If voting='hard'`` : array-like = [n_classifiers, n_samples] Class labels predicted by each classifier.","title":"Methods"},{"location":"api_subpackages/mlxtend.classifier/#logisticregression","text":"LogisticRegression(eta=0.01, epochs=50, l2_lambda=0.0, minibatches=1, random_seed=None, print_progress=0) Logistic regression classifier. Note that this implementation of Logistic Regression expects binary class labels in {0, 1}. Parameters eta : float (default: 0.01) Learning rate (between 0.0 and 1.0) epochs : int (default: 50) Passes over the training dataset. Prior to each epoch, the dataset is shuffled if minibatches > 1 to prevent cycles in stochastic gradient descent. l2_lambda : float Regularization parameter for L2 regularization. No regularization if l2_lambda=0.0. minibatches : int (default: 1) The number of minibatches for gradient-based optimization. If 1: Gradient Descent learning If len(y): Stochastic Gradient Descent (SGD) online learning If 1 < minibatches < len(y): SGD Minibatch learning random_seed : int (default: None) Set random state for shuffling and initializing the weights. print_progress : int (default: 0) Prints progress in fitting to stderr. 0: No output 1: Epochs elapsed and cost 2: 1 plus time elapsed 3: 2 plus estimated time until completion Attributes w_ : 2d-array, shape={n_features, 1} Model weights after fitting. b_ : 1d-array, shape={1,} Bias unit after fitting. cost_ : list List of floats with cross_entropy cost (sgd or gd) for every epoch. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/classifier/LogisticRegression/","title":"LogisticRegression"},{"location":"api_subpackages/mlxtend.classifier/#methods_2","text":"fit(X, y, init_params=True) Learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. init_params : bool (default: True) Re-initializes model parameters prior to fitting. Set False to continue training with weights from a previous model fitting. Returns self : object get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values.' adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py","title":"Methods"},{"location":"api_subpackages/mlxtend.classifier/#author-gael-varoquaux-amp103amp97amp101amp108amp46amp118amp97amp114amp111amp113amp117amp97amp117amp120amp64amp110amp111amp114amp109amp97amp108amp101amp115amp117amp112amp46amp111amp114amp103_2","text":"","title":"Author: Gael Varoquaux gael.varoquaux@normalesup.org"},{"location":"api_subpackages/mlxtend.classifier/#license-bsd-3-clause_2","text":"predict(X) Predict targets from X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns target_values : array-like, shape = [n_samples] Predicted target values. predict_proba(X) Predict class probabilities of X from the net input. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns Class 1 probability : float score(X, y) Compute the prediction accuracy Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values (true class labels). Returns acc : float The prediction accuracy as a float between 0.0 and 1.0 (perfect score). set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py","title":"License: BSD 3 clause"},{"location":"api_subpackages/mlxtend.classifier/#author-gael-varoquaux-amp103amp97amp101amp108amp46amp118amp97amp114amp111amp113amp117amp97amp117amp120amp64amp110amp111amp114amp109amp97amp108amp101amp115amp117amp112amp46amp111amp114amp103_3","text":"","title":"Author: Gael Varoquaux gael.varoquaux@normalesup.org"},{"location":"api_subpackages/mlxtend.classifier/#license-bsd-3-clause_3","text":"","title":"License: BSD 3 clause"},{"location":"api_subpackages/mlxtend.classifier/#multilayerperceptron","text":"MultiLayerPerceptron(eta=0.5, epochs=50, hidden_layers=[50], n_classes=None, momentum=0.0, l1=0.0, l2=0.0, dropout=1.0, decrease_const=0.0, minibatches=1, random_seed=None, print_progress=0) Multi-layer perceptron classifier with logistic sigmoid activations Parameters eta : float (default: 0.5) Learning rate (between 0.0 and 1.0) epochs : int (default: 50) Passes over the training dataset. Prior to each epoch, the dataset is shuffled if minibatches > 1 to prevent cycles in stochastic gradient descent. hidden_layers : list (default: [50]) Number of units per hidden layer. By default 50 units in the first hidden layer. At the moment only 1 hidden layer is supported n_classes : int (default: None) A positive integer to declare the number of class labels if not all class labels are present in a partial training set. Gets the number of class labels automatically if None. l1 : float (default: 0.0) L1 regularization strength l2 : float (default: 0.0) L2 regularization strength momentum : float (default: 0.0) Momentum constant. Factor multiplied with the gradient of the previous epoch t-1 to improve learning speed w(t) := w(t) - (grad(t) + momentum * grad(t-1)) decrease_const : float (default: 0.0) Decrease constant. Shrinks the learning rate after each epoch via eta / (1 + epoch*decrease_const) minibatches : int (default: 1) Divide the training data into k minibatches for accelerated stochastic gradient descent learning. Gradient Descent Learning if minibatches = 1 Stochastic Gradient Descent learning if minibatches = len(y) Minibatch learning if minibatches > 1 random_seed : int (default: None) Set random state for shuffling and initializing the weights. print_progress : int (default: 0) Prints progress in fitting to stderr. 0: No output 1: Epochs elapsed and cost 2: 1 plus time elapsed 3: 2 plus estimated time until completion Attributes w_ : 2d-array, shape=[n_features, n_classes] Weights after fitting. b_ : 1D-array, shape=[n_classes] Bias units after fitting. cost_ : list List of floats; the mean categorical cross entropy cost after each epoch. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/classifier/MultiLayerPerceptron/","title":"MultiLayerPerceptron"},{"location":"api_subpackages/mlxtend.classifier/#methods_3","text":"fit(X, y, init_params=True) Learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. init_params : bool (default: True) Re-initializes model parameters prior to fitting. Set False to continue training with weights from a previous model fitting. Returns self : object get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values.' adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py","title":"Methods"},{"location":"api_subpackages/mlxtend.classifier/#author-gael-varoquaux-amp103amp97amp101amp108amp46amp118amp97amp114amp111amp113amp117amp97amp117amp120amp64amp110amp111amp114amp109amp97amp108amp101amp115amp117amp112amp46amp111amp114amp103_4","text":"","title":"Author: Gael Varoquaux gael.varoquaux@normalesup.org"},{"location":"api_subpackages/mlxtend.classifier/#license-bsd-3-clause_4","text":"predict(X) Predict targets from X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns target_values : array-like, shape = [n_samples] Predicted target values. predict_proba(X) Predict class probabilities of X from the net input. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns Class probabilties : array-like, shape= [n_samples, n_classes] score(X, y) Compute the prediction accuracy Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values (true class labels). Returns acc : float The prediction accuracy as a float between 0.0 and 1.0 (perfect score). set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py","title":"License: BSD 3 clause"},{"location":"api_subpackages/mlxtend.classifier/#author-gael-varoquaux-amp103amp97amp101amp108amp46amp118amp97amp114amp111amp113amp117amp97amp117amp120amp64amp110amp111amp114amp109amp97amp108amp101amp115amp117amp112amp46amp111amp114amp103_5","text":"","title":"Author: Gael Varoquaux gael.varoquaux@normalesup.org"},{"location":"api_subpackages/mlxtend.classifier/#license-bsd-3-clause_5","text":"","title":"License: BSD 3 clause"},{"location":"api_subpackages/mlxtend.classifier/#perceptron","text":"Perceptron(eta=0.1, epochs=50, random_seed=None, print_progress=0) Perceptron classifier. Note that this implementation of the Perceptron expects binary class labels in {0, 1}. Parameters eta : float (default: 0.1) Learning rate (between 0.0 and 1.0) epochs : int (default: 50) Number of passes over the training dataset. Prior to each epoch, the dataset is shuffled to prevent cycles. random_seed : int Random state for initializing random weights and shuffling. print_progress : int (default: 0) Prints progress in fitting to stderr. 0: No output 1: Epochs elapsed and cost 2: 1 plus time elapsed 3: 2 plus estimated time until completion Attributes w_ : 2d-array, shape={n_features, 1} Model weights after fitting. b_ : 1d-array, shape={1,} Bias unit after fitting. cost_ : list Number of misclassifications in every epoch. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/classifier/Perceptron/","title":"Perceptron"},{"location":"api_subpackages/mlxtend.classifier/#methods_4","text":"fit(X, y, init_params=True) Learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. init_params : bool (default: True) Re-initializes model parameters prior to fitting. Set False to continue training with weights from a previous model fitting. Returns self : object get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values.' adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py","title":"Methods"},{"location":"api_subpackages/mlxtend.classifier/#author-gael-varoquaux-amp103amp97amp101amp108amp46amp118amp97amp114amp111amp113amp117amp97amp117amp120amp64amp110amp111amp114amp109amp97amp108amp101amp115amp117amp112amp46amp111amp114amp103_6","text":"","title":"Author: Gael Varoquaux gael.varoquaux@normalesup.org"},{"location":"api_subpackages/mlxtend.classifier/#license-bsd-3-clause_6","text":"predict(X) Predict targets from X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns target_values : array-like, shape = [n_samples] Predicted target values. score(X, y) Compute the prediction accuracy Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values (true class labels). Returns acc : float The prediction accuracy as a float between 0.0 and 1.0 (perfect score). set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py","title":"License: BSD 3 clause"},{"location":"api_subpackages/mlxtend.classifier/#author-gael-varoquaux-amp103amp97amp101amp108amp46amp118amp97amp114amp111amp113amp117amp97amp117amp120amp64amp110amp111amp114amp109amp97amp108amp101amp115amp117amp112amp46amp111amp114amp103_7","text":"","title":"Author: Gael Varoquaux gael.varoquaux@normalesup.org"},{"location":"api_subpackages/mlxtend.classifier/#license-bsd-3-clause_7","text":"","title":"License: BSD 3 clause"},{"location":"api_subpackages/mlxtend.classifier/#softmaxregression","text":"SoftmaxRegression(eta=0.01, epochs=50, l2=0.0, minibatches=1, n_classes=None, random_seed=None, print_progress=0) Softmax regression classifier. Parameters eta : float (default: 0.01) Learning rate (between 0.0 and 1.0) epochs : int (default: 50) Passes over the training dataset. Prior to each epoch, the dataset is shuffled if minibatches > 1 to prevent cycles in stochastic gradient descent. l2 : float Regularization parameter for L2 regularization. No regularization if l2=0.0. minibatches : int (default: 1) The number of minibatches for gradient-based optimization. If 1: Gradient Descent learning If len(y): Stochastic Gradient Descent (SGD) online learning If 1 < minibatches < len(y): SGD Minibatch learning n_classes : int (default: None) A positive integer to declare the number of class labels if not all class labels are present in a partial training set. Gets the number of class labels automatically if None. random_seed : int (default: None) Set random state for shuffling and initializing the weights. print_progress : int (default: 0) Prints progress in fitting to stderr. 0: No output 1: Epochs elapsed and cost 2: 1 plus time elapsed 3: 2 plus estimated time until completion Attributes w_ : 2d-array, shape={n_features, 1} Model weights after fitting. b_ : 1d-array, shape={1,} Bias unit after fitting. cost_ : list List of floats, the average cross_entropy for each epoch. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/classifier/SoftmaxRegression/","title":"SoftmaxRegression"},{"location":"api_subpackages/mlxtend.classifier/#methods_5","text":"fit(X, y, init_params=True) Learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. init_params : bool (default: True) Re-initializes model parameters prior to fitting. Set False to continue training with weights from a previous model fitting. Returns self : object get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values.' adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py","title":"Methods"},{"location":"api_subpackages/mlxtend.classifier/#author-gael-varoquaux-amp103amp97amp101amp108amp46amp118amp97amp114amp111amp113amp117amp97amp117amp120amp64amp110amp111amp114amp109amp97amp108amp101amp115amp117amp112amp46amp111amp114amp103_8","text":"","title":"Author: Gael Varoquaux gael.varoquaux@normalesup.org"},{"location":"api_subpackages/mlxtend.classifier/#license-bsd-3-clause_8","text":"predict(X) Predict targets from X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns target_values : array-like, shape = [n_samples] Predicted target values. predict_proba(X) Predict class probabilities of X from the net input. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns Class probabilties : array-like, shape= [n_samples, n_classes] score(X, y) Compute the prediction accuracy Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values (true class labels). Returns acc : float The prediction accuracy as a float between 0.0 and 1.0 (perfect score). set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py","title":"License: BSD 3 clause"},{"location":"api_subpackages/mlxtend.classifier/#author-gael-varoquaux-amp103amp97amp101amp108amp46amp118amp97amp114amp111amp113amp117amp97amp117amp120amp64amp110amp111amp114amp109amp97amp108amp101amp115amp117amp112amp46amp111amp114amp103_9","text":"","title":"Author: Gael Varoquaux gael.varoquaux@normalesup.org"},{"location":"api_subpackages/mlxtend.classifier/#license-bsd-3-clause_9","text":"","title":"License: BSD 3 clause"},{"location":"api_subpackages/mlxtend.classifier/#stackingcvclassifier","text":"StackingCVClassifier(classifiers, meta_classifier, use_probas=False, cv=2, use_features_in_secondary=False, stratify=True, shuffle=True, verbose=0, store_train_meta_features=False, use_clones=True) A 'Stacking Cross-Validation' classifier for scikit-learn estimators. New in mlxtend v0.4.3 Notes The StackingCVClassifier uses scikit-learn's check_cv internally, which doesn't support a random seed. Thus NumPy's random seed need to be specified explicitely for deterministic behavior, for instance, by setting np.random.seed(RANDOM_SEED) prior to fitting the StackingCVClassifier Parameters classifiers : array-like, shape = [n_classifiers] A list of classifiers. Invoking the fit method on the StackingCVClassifer will fit clones of these original classifiers that will be stored in the class attribute self.clfs_ . meta_classifier : object The meta-classifier to be fitted on the ensemble of classifiers use_probas : bool (default: False) If True, trains meta-classifier based on predicted probabilities instead of class labels. cv : int, cross-validation generator or an iterable, optional (default: 2) Determines the cross-validation splitting strategy. Possible inputs for cv are: - None, to use the default 2-fold cross validation, - integer, to specify the number of folds in a (Stratified)KFold , - An object to be used as a cross-validation generator. - An iterable yielding train, test splits. For integer/None inputs, it will use either a KFold or StratifiedKFold cross validation depending the value of stratify argument. use_features_in_secondary : bool (default: False) If True, the meta-classifier will be trained both on the predictions of the original classifiers and the original dataset. If False, the meta-classifier will be trained only on the predictions of the original classifiers. stratify : bool (default: True) If True, and the cv argument is integer it will follow a stratified K-Fold cross validation technique. If the cv argument is a specific cross validation technique, this argument is omitted. shuffle : bool (default: True) If True, and the cv argument is integer, the training data will be shuffled at fitting stage prior to cross-validation. If the cv argument is a specific cross validation technique, this argument is omitted. verbose : int, optional (default=0) Controls the verbosity of the building process. - verbose=0 (default): Prints nothing - verbose=1 : Prints the number & name of the regressor being fitted and which fold is currently being used for fitting - verbose=2 : Prints info about the parameters of the regressor being fitted - verbose>2 : Changes verbose param of the underlying regressor to self.verbose - 2 store_train_meta_features : bool (default: False) If True, the meta-features computed from the training data used for fitting the meta-classifier stored in the self.train_meta_features_ array, which can be accessed after calling fit . use_clones : bool (default: True) Clones the classifiers for stacking classification if True (default) or else uses the original ones, which will be refitted on the dataset upon calling the fit method. Hence, if use_clones=True, the original input classifiers will remain unmodified upon using the StackingCVClassifier's fit method. Setting use_clones=False is recommended if you are working with estimators that are supporting the scikit-learn fit/predict API interface but are not compatible to scikit-learn's clone function. Attributes clfs_ : list, shape=[n_classifiers] Fitted classifiers (clones of the original classifiers) meta_clf_ : estimator Fitted meta-classifier (clone of the original meta-estimator) train_meta_features : numpy array, shape = [n_samples, n_classifiers] meta-features for training data, where n_samples is the number of samples in training data and n_classifiers is the number of classfiers. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/classifier/StackingCVClassifier/","title":"StackingCVClassifier"},{"location":"api_subpackages/mlxtend.classifier/#methods_6","text":"fit(X, y, groups=None, sample_weight=None) Fit ensemble classifers and the meta-classifier. Parameters X : numpy array, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : numpy array, shape = [n_samples] Target values. groups : numpy array/None, shape = [n_samples] The group that each sample belongs to. This is used by specific folding strategies such as GroupKFold() sample_weight : array-like, shape = [n_samples], optional Sample weights passed as sample_weights to each regressor in the regressors list as well as the meta_regressor. Raises error if some regressor does not support sample_weight in the fit() method. Returns self : object fit_transform(X, y=None, fit_params) Fit to data, then transform it. Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X. Parameters X : numpy array of shape [n_samples, n_features] Training set. y : numpy array of shape [n_samples] Target values. Returns X_new : numpy array of shape [n_samples, n_features_new] Transformed array. get_params(deep=True) Return estimator parameter names for GridSearch support. predict(X) Predict target values for X. Parameters X : numpy array, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns labels : array-like, shape = [n_samples] Predicted class labels. predict_meta_features(X) Get meta-features of test-data. Parameters X : numpy array, shape = [n_samples, n_features] Test vectors, where n_samples is the number of samples and n_features is the number of features. Returns meta-features : numpy array, shape = [n_samples, n_classifiers] Returns the meta-features for test data. predict_proba(X) Predict class probabilities for X. Parameters X : numpy array, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns proba : array-like, shape = [n_samples, n_classes] Probability for each class per sample. score(X, y, sample_weight=None) Returns the mean accuracy on the given test data and labels. In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted. Parameters X : array-like, shape = (n_samples, n_features) Test samples. y : array-like, shape = (n_samples) or (n_samples, n_outputs) True labels for X. sample_weight : array-like, shape = [n_samples], optional Sample weights. Returns score : float Mean accuracy of self.predict(X) wrt. y. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self","title":"Methods"},{"location":"api_subpackages/mlxtend.classifier/#stackingclassifier","text":"StackingClassifier(classifiers, meta_classifier, use_probas=False, average_probas=False, verbose=0, use_features_in_secondary=False, store_train_meta_features=False, use_clones=True) A Stacking classifier for scikit-learn estimators for classification. Parameters classifiers : array-like, shape = [n_classifiers] A list of classifiers. Invoking the fit method on the StackingClassifer will fit clones of these original classifiers that will be stored in the class attribute self.clfs_ . meta_classifier : object The meta-classifier to be fitted on the ensemble of classifiers use_probas : bool (default: False) If True, trains meta-classifier based on predicted probabilities instead of class labels. average_probas : bool (default: False) Averages the probabilities as meta features if True. verbose : int, optional (default=0) Controls the verbosity of the building process. - verbose=0 (default): Prints nothing - verbose=1 : Prints the number & name of the regressor being fitted - verbose=2 : Prints info about the parameters of the regressor being fitted - verbose>2 : Changes verbose param of the underlying regressor to self.verbose - 2 use_features_in_secondary : bool (default: False) If True, the meta-classifier will be trained both on the predictions of the original classifiers and the original dataset. If False, the meta-classifier will be trained only on the predictions of the original classifiers. store_train_meta_features : bool (default: False) If True, the meta-features computed from the training data used for fitting the meta-classifier stored in the self.train_meta_features_ array, which can be accessed after calling fit . use_clones : bool (default: True) Clones the classifiers for stacking classification if True (default) or else uses the original ones, which will be refitted on the dataset upon calling the fit method. Hence, if use_clones=True, the original input classifiers will remain unmodified upon using the StackingClassifier's fit method. Setting use_clones=False is recommended if you are working with estimators that are supporting the scikit-learn fit/predict API interface but are not compatible to scikit-learn's clone function. Attributes clfs_ : list, shape=[n_classifiers] Fitted classifiers (clones of the original classifiers) meta_clf_ : estimator Fitted meta-classifier (clone of the original meta-estimator) train_meta_features : numpy array, shape = [n_samples, n_classifiers] meta-features for training data, where n_samples is the number of samples in training data and n_classifiers is the number of classfiers. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/classifier/StackingClassifier/","title":"StackingClassifier"},{"location":"api_subpackages/mlxtend.classifier/#methods_7","text":"fit(X, y, sample_weight=None) Fit ensemble classifers and the meta-classifier. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] or [n_samples, n_outputs] Target values. sample_weight : array-like, shape = [n_samples], optional Sample weights passed as sample_weights to each regressor in the regressors list as well as the meta_regressor. Raises error if some regressor does not support sample_weight in the fit() method. Returns self : object fit_transform(X, y=None, fit_params) Fit to data, then transform it. Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X. Parameters X : numpy array of shape [n_samples, n_features] Training set. y : numpy array of shape [n_samples] Target values. Returns X_new : numpy array of shape [n_samples, n_features_new] Transformed array. get_params(deep=True) Return estimator parameter names for GridSearch support. predict(X) Predict target values for X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns labels : array-like, shape = [n_samples] or [n_samples, n_outputs] Predicted class labels. predict_meta_features(X) Get meta-features of test-data. Parameters X : numpy array, shape = [n_samples, n_features] Test vectors, where n_samples is the number of samples and n_features is the number of features. Returns meta-features : numpy array, shape = [n_samples, n_classifiers] Returns the meta-features for test data. predict_proba(X) Predict class probabilities for X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns proba : array-like, shape = [n_samples, n_classes] or a list of n_outputs of such arrays if n_outputs > 1. Probability for each class per sample. score(X, y, sample_weight=None) Returns the mean accuracy on the given test data and labels. In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted. Parameters X : array-like, shape = (n_samples, n_features) Test samples. y : array-like, shape = (n_samples) or (n_samples, n_outputs) True labels for X. sample_weight : array-like, shape = [n_samples], optional Sample weights. Returns score : float Mean accuracy of self.predict(X) wrt. y. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self","title":"Methods"},{"location":"api_subpackages/mlxtend.cluster/","text":"mlxtend version: 0.14.0dev Kmeans Kmeans(k, max_iter=10, convergence_tolerance=1e-05, random_seed=None, print_progress=0) K-means clustering class. Added in 0.4.1dev Parameters k : int Number of clusters max_iter : int (default: 10) Number of iterations during cluster assignment. Cluster re-assignment stops automatically when the algorithm converged. convergence_tolerance : float (default: 1e-05) Compares current centroids with centroids of the previous iteration using the given tolerance (a small positive float)to determine if the algorithm converged early. random_seed : int (default: None) Set random state for the initial centroid assignment. print_progress : int (default: 0) Prints progress in fitting to stderr. 0: No output 1: Iterations elapsed 2: 1 plus time elapsed 3: 2 plus estimated time until completion Attributes centroids_ : 2d-array, shape={k, n_features} Feature values of the k cluster centroids. custers_ : dictionary The cluster assignments stored as a Python dictionary; the dictionary keys denote the cluster indeces and the items are Python lists of the sample indices that were assigned to each cluster. iterations_ : int Number of iterations until convergence. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/classifier/Kmeans/ Methods fit(X, init_params=True) Learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. init_params : bool (default: True) Re-initializes model parameters prior to fitting. Set False to continue training with weights from a previous model fitting. Returns self : object get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values.' adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py Author: Gael Varoquaux gael.varoquaux@normalesup.org License: BSD 3 clause predict(X) Predict targets from X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns target_values : array-like, shape = [n_samples] Predicted target values. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py Author: Gael Varoquaux gael.varoquaux@normalesup.org License: BSD 3 clause","title":"Mlxtend.cluster"},{"location":"api_subpackages/mlxtend.cluster/#kmeans","text":"Kmeans(k, max_iter=10, convergence_tolerance=1e-05, random_seed=None, print_progress=0) K-means clustering class. Added in 0.4.1dev Parameters k : int Number of clusters max_iter : int (default: 10) Number of iterations during cluster assignment. Cluster re-assignment stops automatically when the algorithm converged. convergence_tolerance : float (default: 1e-05) Compares current centroids with centroids of the previous iteration using the given tolerance (a small positive float)to determine if the algorithm converged early. random_seed : int (default: None) Set random state for the initial centroid assignment. print_progress : int (default: 0) Prints progress in fitting to stderr. 0: No output 1: Iterations elapsed 2: 1 plus time elapsed 3: 2 plus estimated time until completion Attributes centroids_ : 2d-array, shape={k, n_features} Feature values of the k cluster centroids. custers_ : dictionary The cluster assignments stored as a Python dictionary; the dictionary keys denote the cluster indeces and the items are Python lists of the sample indices that were assigned to each cluster. iterations_ : int Number of iterations until convergence. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/classifier/Kmeans/","title":"Kmeans"},{"location":"api_subpackages/mlxtend.cluster/#methods","text":"fit(X, init_params=True) Learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. init_params : bool (default: True) Re-initializes model parameters prior to fitting. Set False to continue training with weights from a previous model fitting. Returns self : object get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values.' adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py","title":"Methods"},{"location":"api_subpackages/mlxtend.cluster/#author-gael-varoquaux-amp103amp97amp101amp108amp46amp118amp97amp114amp111amp113amp117amp97amp117amp120amp64amp110amp111amp114amp109amp97amp108amp101amp115amp117amp112amp46amp111amp114amp103","text":"","title":"Author: Gael Varoquaux gael.varoquaux@normalesup.org"},{"location":"api_subpackages/mlxtend.cluster/#license-bsd-3-clause","text":"predict(X) Predict targets from X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns target_values : array-like, shape = [n_samples] Predicted target values. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py","title":"License: BSD 3 clause"},{"location":"api_subpackages/mlxtend.cluster/#author-gael-varoquaux-amp103amp97amp101amp108amp46amp118amp97amp114amp111amp113amp117amp97amp117amp120amp64amp110amp111amp114amp109amp97amp108amp101amp115amp117amp112amp46amp111amp114amp103_1","text":"","title":"Author: Gael Varoquaux gael.varoquaux@normalesup.org"},{"location":"api_subpackages/mlxtend.cluster/#license-bsd-3-clause_1","text":"","title":"License: BSD 3 clause"},{"location":"api_subpackages/mlxtend.data/","text":"mlxtend version: 0.14.0dev autompg_data autompg_data() Auto MPG dataset. Source : https://archive.ics.uci.edu/ml/datasets/Auto+MPG Number of samples : 392 Continuous target variable : mpg Dataset Attributes: 1) cylinders: multi-valued discrete 2) displacement: continuous 3) horsepower: continuous 4) weight: continuous 5) acceleration: continuous 6) model year: multi-valued discrete 7) origin: multi-valued discrete 8) car name: string (unique for each instance) Returns X, y : [n_samples, n_features], [n_targets] X is the feature matrix with 392 auto samples as rows and 8 feature columns (6 rows with NaNs removed). y is a 1-dimensional array of the target MPG values. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/data/autompg_data/ boston_housing_data boston_housing_data() Boston Housing dataset. Source : https://archive.ics.uci.edu/ml/datasets/Housing Number of samples : 506 Continuous target variable : MEDV MEDV = Median value of owner-occupied homes in $1000's Dataset Attributes: 1) CRIM per capita crime rate by town 2) ZN proportion of residential land zoned for lots over 25,000 sq.ft. 3) INDUS proportion of non-retail business acres per town 4) CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise) 5) NOX nitric oxides concentration (parts per 10 million) 6) RM average number of rooms per dwelling 7) AGE proportion of owner-occupied units built prior to 1940 8) DIS weighted distances to five Boston employment centres 9) RAD index of accessibility to radial highways 10) TAX full-value property-tax rate per $10,000 11) PTRATIO pupil-teacher ratio by town 12) B 1000(Bk - 0.63)^2 where Bk is the prop. of b. by town 13) LSTAT % lower status of the population Returns X, y : [n_samples, n_features], [n_class_labels] X is the feature matrix with 506 housing samples as rows and 13 feature columns. y is a 1-dimensional array of the continuous target variable MEDV Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/data/boston_housing_data/ iris_data iris_data() Iris flower dataset. Source : https://archive.ics.uci.edu/ml/datasets/Iris Number of samples : 150 Class labels : {0, 1, 2}, distribution: [50, 50, 50] 0 = setosa, 1 = versicolor, 2 = virginica. Dataset Attributes: 1) sepal length [cm] 2) sepal width [cm] 3) petal length [cm] 4) petal width [cm] Returns X, y : [n_samples, n_features], [n_class_labels] X is the feature matrix with 150 flower samples as rows, and 4 feature columns sepal length, sepal width, petal length, and petal width. y is a 1-dimensional array of the class labels {0, 1, 2} Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/data/iris_data/ loadlocal_mnist loadlocal_mnist(images_path, labels_path) Read MNIST from ubyte files. Parameters images_path : str path to the test or train MNIST ubyte file labels_path : str path to the test or train MNIST class labels file Returns images : [n_samples, n_pixels] numpy.array Pixel values of the images. labels : [n_samples] numpy array Target class labels Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/data/loadlocal_mnist/ make_multiplexer_dataset make_multiplexer_dataset(address_bits=2, sample_size=100, positive_class_ratio=0.5, shuffle=False, random_seed=None) Function to create a binary n-bit multiplexer dataset. New in mlxtend v0.9 Parameters address_bits : int (default: 2) A positive integer that determines the number of address bits in the multiplexer, which in turn determine the n-bit capacity of the multiplexer and therefore the number of features. The number of features is determined by the number of address bits. For example, 2 address bits will result in a 6 bit multiplexer and consequently 6 features (2 + 2^2 = 6). If address_bits=3 , then this results in an 11-bit multiplexer as (2 + 2^3 = 11) with 11 features. sample_size : int (default: 100) The total number of samples generated. positive_class_ratio : float (default: 0.5) The fraction (a float between 0 and 1) of samples in the sample_size d dataset that have class label 1. If positive_class_ratio=0.5 (default), then the ratio of class 0 and class 1 samples is perfectly balanced. shuffle : Bool (default: False) Whether or not to shuffle the features and labels. If False (default), the samples are returned in sorted order starting with sample_size /2 samples with class label 0 and followed by sample_size /2 samples with class label 1. random_seed : int (default: None) Random seed used for generating the multiplexer samples and shuffling. Returns X, y : [n_samples, n_features], [n_class_labels] X is the feature matrix with the number of samples equal to sample_size . The number of features is determined by the number of address bits. For instance, 2 address bits will result in a 6 bit multiplexer and consequently 6 features (2 + 2^2 = 6). All features are binary (values in {0, 1}). y is a 1-dimensional array of class labels in {0, 1}. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/data/make_multiplexer_dataset mnist_data mnist_data() 5000 samples from the MNIST handwritten digits dataset. Data Source : http://yann.lecun.com/exdb/mnist/ Returns X, y : [n_samples, n_features], [n_class_labels] X is the feature matrix with 5000 image samples as rows, each row consists of 28x28 pixels that were unrolled into 784 pixel feature vectors. y contains the 10 unique class labels 0-9. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/data/mnist_data/ three_blobs_data three_blobs_data() A random dataset of 3 2D blobs for clustering. Number of samples : 150 Suggested labels : {0, 1, 2}, distribution: [50, 50, 50] Returns X, y : [n_samples, n_features], [n_cluster_labels] X is the feature matrix with 159 samples as rows and 2 feature columns. y is a 1-dimensional array of the 3 suggested cluster labels 0, 1, 2 Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/data/three_blobs_data wine_data wine_data() Wine dataset. Source : https://archive.ics.uci.edu/ml/datasets/Wine Number of samples : 178 Class labels : {0, 1, 2}, distribution: [59, 71, 48] Dataset Attributes: 1) Alcohol 2) Malic acid 3) Ash 4) Alcalinity of ash 5) Magnesium 6) Total phenols 7) Flavanoids 8) Nonflavanoid phenols 9) Proanthocyanins 10) Color intensity 11) Hue 12) OD280/OD315 of diluted wines 13) Proline Returns X, y : [n_samples, n_features], [n_class_labels] X is the feature matrix with 178 wine samples as rows and 13 feature columns. y is a 1-dimensional array of the 3 class labels 0, 1, 2 Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/data/wine_data","title":"Mlxtend.data"},{"location":"api_subpackages/mlxtend.data/#autompg_data","text":"autompg_data() Auto MPG dataset. Source : https://archive.ics.uci.edu/ml/datasets/Auto+MPG Number of samples : 392 Continuous target variable : mpg Dataset Attributes: 1) cylinders: multi-valued discrete 2) displacement: continuous 3) horsepower: continuous 4) weight: continuous 5) acceleration: continuous 6) model year: multi-valued discrete 7) origin: multi-valued discrete 8) car name: string (unique for each instance) Returns X, y : [n_samples, n_features], [n_targets] X is the feature matrix with 392 auto samples as rows and 8 feature columns (6 rows with NaNs removed). y is a 1-dimensional array of the target MPG values. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/data/autompg_data/","title":"autompg_data"},{"location":"api_subpackages/mlxtend.data/#boston_housing_data","text":"boston_housing_data() Boston Housing dataset. Source : https://archive.ics.uci.edu/ml/datasets/Housing Number of samples : 506 Continuous target variable : MEDV MEDV = Median value of owner-occupied homes in $1000's Dataset Attributes: 1) CRIM per capita crime rate by town 2) ZN proportion of residential land zoned for lots over 25,000 sq.ft. 3) INDUS proportion of non-retail business acres per town 4) CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise) 5) NOX nitric oxides concentration (parts per 10 million) 6) RM average number of rooms per dwelling 7) AGE proportion of owner-occupied units built prior to 1940 8) DIS weighted distances to five Boston employment centres 9) RAD index of accessibility to radial highways 10) TAX full-value property-tax rate per $10,000 11) PTRATIO pupil-teacher ratio by town 12) B 1000(Bk - 0.63)^2 where Bk is the prop. of b. by town 13) LSTAT % lower status of the population Returns X, y : [n_samples, n_features], [n_class_labels] X is the feature matrix with 506 housing samples as rows and 13 feature columns. y is a 1-dimensional array of the continuous target variable MEDV Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/data/boston_housing_data/","title":"boston_housing_data"},{"location":"api_subpackages/mlxtend.data/#iris_data","text":"iris_data() Iris flower dataset. Source : https://archive.ics.uci.edu/ml/datasets/Iris Number of samples : 150 Class labels : {0, 1, 2}, distribution: [50, 50, 50] 0 = setosa, 1 = versicolor, 2 = virginica. Dataset Attributes: 1) sepal length [cm] 2) sepal width [cm] 3) petal length [cm] 4) petal width [cm] Returns X, y : [n_samples, n_features], [n_class_labels] X is the feature matrix with 150 flower samples as rows, and 4 feature columns sepal length, sepal width, petal length, and petal width. y is a 1-dimensional array of the class labels {0, 1, 2} Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/data/iris_data/","title":"iris_data"},{"location":"api_subpackages/mlxtend.data/#loadlocal_mnist","text":"loadlocal_mnist(images_path, labels_path) Read MNIST from ubyte files. Parameters images_path : str path to the test or train MNIST ubyte file labels_path : str path to the test or train MNIST class labels file Returns images : [n_samples, n_pixels] numpy.array Pixel values of the images. labels : [n_samples] numpy array Target class labels Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/data/loadlocal_mnist/","title":"loadlocal_mnist"},{"location":"api_subpackages/mlxtend.data/#make_multiplexer_dataset","text":"make_multiplexer_dataset(address_bits=2, sample_size=100, positive_class_ratio=0.5, shuffle=False, random_seed=None) Function to create a binary n-bit multiplexer dataset. New in mlxtend v0.9 Parameters address_bits : int (default: 2) A positive integer that determines the number of address bits in the multiplexer, which in turn determine the n-bit capacity of the multiplexer and therefore the number of features. The number of features is determined by the number of address bits. For example, 2 address bits will result in a 6 bit multiplexer and consequently 6 features (2 + 2^2 = 6). If address_bits=3 , then this results in an 11-bit multiplexer as (2 + 2^3 = 11) with 11 features. sample_size : int (default: 100) The total number of samples generated. positive_class_ratio : float (default: 0.5) The fraction (a float between 0 and 1) of samples in the sample_size d dataset that have class label 1. If positive_class_ratio=0.5 (default), then the ratio of class 0 and class 1 samples is perfectly balanced. shuffle : Bool (default: False) Whether or not to shuffle the features and labels. If False (default), the samples are returned in sorted order starting with sample_size /2 samples with class label 0 and followed by sample_size /2 samples with class label 1. random_seed : int (default: None) Random seed used for generating the multiplexer samples and shuffling. Returns X, y : [n_samples, n_features], [n_class_labels] X is the feature matrix with the number of samples equal to sample_size . The number of features is determined by the number of address bits. For instance, 2 address bits will result in a 6 bit multiplexer and consequently 6 features (2 + 2^2 = 6). All features are binary (values in {0, 1}). y is a 1-dimensional array of class labels in {0, 1}. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/data/make_multiplexer_dataset","title":"make_multiplexer_dataset"},{"location":"api_subpackages/mlxtend.data/#mnist_data","text":"mnist_data() 5000 samples from the MNIST handwritten digits dataset. Data Source : http://yann.lecun.com/exdb/mnist/ Returns X, y : [n_samples, n_features], [n_class_labels] X is the feature matrix with 5000 image samples as rows, each row consists of 28x28 pixels that were unrolled into 784 pixel feature vectors. y contains the 10 unique class labels 0-9. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/data/mnist_data/","title":"mnist_data"},{"location":"api_subpackages/mlxtend.data/#three_blobs_data","text":"three_blobs_data() A random dataset of 3 2D blobs for clustering. Number of samples : 150 Suggested labels : {0, 1, 2}, distribution: [50, 50, 50] Returns X, y : [n_samples, n_features], [n_cluster_labels] X is the feature matrix with 159 samples as rows and 2 feature columns. y is a 1-dimensional array of the 3 suggested cluster labels 0, 1, 2 Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/data/three_blobs_data","title":"three_blobs_data"},{"location":"api_subpackages/mlxtend.data/#wine_data","text":"wine_data() Wine dataset. Source : https://archive.ics.uci.edu/ml/datasets/Wine Number of samples : 178 Class labels : {0, 1, 2}, distribution: [59, 71, 48] Dataset Attributes: 1) Alcohol 2) Malic acid 3) Ash 4) Alcalinity of ash 5) Magnesium 6) Total phenols 7) Flavanoids 8) Nonflavanoid phenols 9) Proanthocyanins 10) Color intensity 11) Hue 12) OD280/OD315 of diluted wines 13) Proline Returns X, y : [n_samples, n_features], [n_class_labels] X is the feature matrix with 178 wine samples as rows and 13 feature columns. y is a 1-dimensional array of the 3 class labels 0, 1, 2 Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/data/wine_data","title":"wine_data"},{"location":"api_subpackages/mlxtend.evaluate/","text":"mlxtend version: 0.14.0dev BootstrapOutOfBag BootstrapOutOfBag(n_splits=200, random_seed=None) Parameters n_splits : int (default=200) Number of bootstrap iterations. Must be larger than 1. random_seed : int (default=None) If int, random_seed is the seed used by the random number generator. Returns train_idx : ndarray The training set indices for that split. test_idx : ndarray The testing set indices for that split. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/BootstrapOutOfBag/ Methods get_n_splits(X=None, y=None, groups=None) Returns the number of splitting iterations in the cross-validator Parameters X : object Always ignored, exists for compatibility with scikit-learn. y : object Always ignored, exists for compatibility with scikit-learn. groups : object Always ignored, exists for compatibility with scikit-learn. Returns n_splits : int Returns the number of splitting iterations in the cross-validator. split(X, y=None, groups=None) y : array-like or None (default: None) Argument is not used and only included as parameter for compatibility, similar to KFold in scikit-learn. groups : array-like or None (default: None) Argument is not used and only included as parameter for compatibility, similar to KFold in scikit-learn. PredefinedHoldoutSplit PredefinedHoldoutSplit(valid_indices) Train/Validation set splitter for sklearn's GridSearchCV etc. Uses user-specified train/validation set indices to split a dataset into train/validation sets using user-defined or random indices. Parameters valid_indices : array-like, shape (num_examples,) Indices of the training examples in the training set to be used for validation. All other indices in the training set are used to for a training subset for model fitting. Methods get_n_splits(X=None, y=None, groups=None) Returns the number of splitting iterations in the cross-validator Parameters X : object Always ignored, exists for compatibility. y : object Always ignored, exists for compatibility. groups : object Always ignored, exists for compatibility. Returns n_splits : 1 Returns the number of splitting iterations in the cross-validator. Always returns 1. split(X, y, groups=None) Generate indices to split data into training and test set. Parameters X : array-like, shape (num_examples, num_features) Training data, where num_examples is the number of examples and num_features is the number of features. y : array-like, shape (num_examples,) The target variable for supervised learning problems. Stratification is done based on the y labels. groups : object Always ignored, exists for compatibility. Yields train_index : ndarray The training set indices for that split. valid_index : ndarray The validation set indices for that split. RandomHoldoutSplit RandomHoldoutSplit(valid_size=0.5, random_seed=None, stratify=False) Train/Validation set splitter for sklearn's GridSearchCV etc. Provides train/validation set indices to split a dataset into train/validation sets using random indices. Parameters valid_size : float (default: 0.5) Proportion of examples that being assigned as validation examples. 1- valid_size will then automatically be assigned as training set examples. random_seed : int (default: None) The random seed for splitting the data into training and validation set partitions. stratify : bool (default: False) True or False, whether to perform a stratified split or not Methods get_n_splits(X=None, y=None, groups=None) Returns the number of splitting iterations in the cross-validator Parameters X : object Always ignored, exists for compatibility. y : object Always ignored, exists for compatibility. groups : object Always ignored, exists for compatibility. Returns n_splits : 1 Returns the number of splitting iterations in the cross-validator. Always returns 1. split(X, y, groups=None) Generate indices to split data into training and test set. Parameters X : array-like, shape (num_examples, num_features) Training data, where num_examples is the number of training examples and num_features is the number of features. y : array-like, shape (num_examples,) The target variable for supervised learning problems. Stratification is done based on the y labels. groups : object Always ignored, exists for compatibility. Yields train_index : ndarray The training set indices for that split. valid_index : ndarray The validation set indices for that split. bootstrap bootstrap(x, func, num_rounds=1000, ci=0.95, ddof=1, seed=None) Implements the ordinary nonparametric bootstrap Parameters x : NumPy array, shape=(n_samples, [n_columns]) An one or multidimensional array of data records func : A function which computes a statistic that is used to compute the bootstrap replicates (the statistic computed from the bootstrap samples). This function must return a scalar value. For example, np.mean or np.median would be an acceptable argument for func if x is a 1-dimensional array or vector. num_rounds : int (default=1000) The number of bootstrap samnples to draw where each bootstrap sample has the same number of records as the original dataset. ci : int (default=0.95) An integer in the range (0, 1) that represents the confidence level for computing the confidence interval. For example, ci=0.95 (default) will compute the 95% confidence interval from the bootstrap replicates. ddof : int The delta degrees of freedom used when computing the standard error. seed : int or None (default=None) Random seed for generating bootstrap samples. Returns original, standard_error, (lower_ci, upper_ci) : tuple Returns the statistic of the original sample ( original ), the standard error of the estimate, and the respective confidence interval bounds. Examples >>> from mlxtend.evaluate import bootstrap >>> rng = np.random.RandomState(123) >>> x = rng.normal(loc=5., size=100) >>> original, std_err, ci_bounds = bootstrap(x, ... num_rounds=1000, ... func=np.mean, ... ci=0.95, ... seed=123) >>> print('Mean: %.2f, SE: +/- %.2f, CI95: [%.2f, %.2f]' % (original, ... std_err, ... ci_bounds[0], ... ci_bounds[1])) Mean: 5.03, SE: +/- 0.11, CI95: [4.80, 5.26] >>> For more usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/bootstrap/ bootstrap_point632_score bootstrap_point632_score(estimator, X, y, n_splits=200, method='.632', scoring_func=None, random_seed=None, clone_estimator=True) Implementation of the .632 [1] and .632+ [2] bootstrap for supervised learning References: [1] Efron, Bradley. 1983. \"Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation.\" Journal of the American Statistical Association 78 (382): 316. doi:10.2307/2288636. [2] Efron, Bradley, and Robert Tibshirani. 1997. \"Improvements on Cross-Validation: The .632+ Bootstrap Method.\" Journal of the American Statistical Association 92 (438): 548. doi:10.2307/2965703. Parameters estimator : object An estimator for classification or regression that follows the scikit-learn API and implements \"fit\" and \"predict\" methods. X : array-like The data to fit. Can be, for example a list, or an array at least 2d. y : array-like, optional, default: None The target variable to try to predict in the case of supervised learning. n_splits : int (default=200) Number of bootstrap iterations. Must be larger than 1. method : str (default='.632') The bootstrap method, which can be either - 1) '.632' bootstrap (default) - 2) '.632+' bootstrap - 3) 'oob' (regular out-of-bag, no weighting) for comparison studies. scoring_func : callable, Score function (or loss function) with signature scoring_func(y, y_pred, **kwargs) . If none, uses classification accuracy if the estimator is a classifier and mean squared error if the estimator is a regressor. random_seed : int (default=None) If int, random_seed is the seed used by the random number generator. clone_estimator : bool (default=True) Clones the estimator if true, otherwise fits the original. Returns scores : array of float, shape=(len(list(n_splits)),) Array of scores of the estimator for each bootstrap replicate. Examples >>> from sklearn import datasets, linear_model >>> from mlxtend.evaluate import bootstrap_point632_score >>> iris = datasets.load_iris() >>> X = iris.data >>> y = iris.target >>> lr = linear_model.LogisticRegression() >>> scores = bootstrap_point632_score(lr, X, y) >>> acc = np.mean(scores) >>> print('Accuracy:', acc) 0.953023146884 >>> lower = np.percentile(scores, 2.5) >>> upper = np.percentile(scores, 97.5) >>> print('95%% Confidence interval: [%.2f, %.2f]' % (lower, upper)) 95% Confidence interval: [0.90, 0.98] For more usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/bootstrap_point632_score/ cochrans_q cochrans_q(y_target, y_model_predictions)* Cochran's Q test to compare 2 or more models. Parameters y_target : array-like, shape=[n_samples] True class labels as 1D NumPy array. *y_model_predictions : array-likes, shape=[n_samples] Variable number of 2 or more arrays that contain the predicted class labels from models as 1D NumPy array. Returns q, p : float or None, float Returns the Q (chi-squared) value and the p-value Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/cochrans_q/ combined_ftest_5x2cv combined_ftest_5x2cv(estimator1, estimator2, X, y, scoring=None, random_seed=None) Implements the 5x2cv combined F test proposed by Alpaydin 1999, to compare the performance of two models. Parameters estimator1 : scikit-learn classifier or regressor estimator2 : scikit-learn classifier or regressor X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. scoring : str, callable, or None (default: None) If None (default), uses 'accuracy' for sklearn classifiers and 'r2' for sklearn regressors. If str, uses a sklearn scoring metric string identifier, for example {accuracy, f1, precision, recall, roc_auc} for classifiers, {'mean_absolute_error', 'mean_squared_error'/'neg_mean_squared_error', 'median_absolute_error', 'r2'} for regressors. If a callable object or function is provided, it has to be conform with sklearn's signature scorer(estimator, X, y) ; see http://scikit-learn.org/stable/modules/generated/sklearn.metrics.make_scorer.html for more information. random_seed : int or None (default: None) Random seed for creating the test/train splits. Returns f : float The F-statistic pvalue : float Two-tailed p-value. If the chosen significance level is larger than the p-value, we reject the null hypothesis and accept that there are significant differences in the two compared models. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/combined_ftest_5x2cv/ confusion_matrix confusion_matrix(y_target, y_predicted, binary=False, positive_label=1) Compute a confusion matrix/contingency table. Parameters y_target : array-like, shape=[n_samples] True class labels. y_predicted : array-like, shape=[n_samples] Predicted class labels. binary : bool (default: False) Maps a multi-class problem onto a binary confusion matrix, where the positive class is 1 and all other classes are 0. positive_label : int (default: 1) Class label of the positive class. Returns mat : array-like, shape=[n_classes, n_classes] Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/confusion_matrix/ feature_importance_permutation feature_importance_permutation(X, y, predict_method, metric, num_rounds=1, seed=None) Feature importance imputation via permutation importance Parameters X : NumPy array, shape = [n_samples, n_features] Dataset, where n_samples is the number of samples and n_features is the number of features. y : NumPy array, shape = [n_samples] Target values. predict_method : prediction function A callable function that predicts the target values from X. metric : str, callable The metric for evaluating the feature importance through permutation. By default, the strings 'accuracy' is recommended for classifiers and the string 'r2' is recommended for regressors. Optionally, a custom scoring function (e.g., metric=scoring_func ) that accepts two arguments, y_true and y_pred, which have similar shape to the y array. num_rounds : int (default=1) Number of rounds the feature columns are permuted to compute the permutation importance. seed : int or None (default=None) Random seed for permuting the feature columns. Returns mean_importance_vals, all_importance_vals : NumPy arrays. The first array, mean_importance_vals has shape [n_features, ] and contains the importance values for all features. The shape of the second array is [n_features, num_rounds] and contains the feature importance for each repetition. If num_rounds=1, it contains the same values as the first array, mean_importance_vals. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/feature_importance_permutation/ ftest ftest(y_target, y_model_predictions)* F-Test test to compare 2 or more models. Parameters y_target : array-like, shape=[n_samples] True class labels as 1D NumPy array. *y_model_predictions : array-likes, shape=[n_samples] Variable number of 2 or more arrays that contain the predicted class labels from models as 1D NumPy array. Returns f, p : float or None, float Returns the F-value and the p-value Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/ftest/ lift_score lift_score(y_target, y_predicted, binary=True, positive_label=1) Lift measures the degree to which the predictions of a classification model are better than randomly-generated predictions. The in terms of True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN), the lift score is computed as: [ TP / (TP+FP) ] / [ (TP+FN) / (TP+TN+FP+FN) ] Parameters y_target : array-like, shape=[n_samples] True class labels. y_predicted : array-like, shape=[n_samples] Predicted class labels. binary : bool (default: True) Maps a multi-class problem onto a binary, where the positive class is 1 and all other classes are 0. positive_label : int (default: 0) Class label of the positive class. Returns score : float Lift score in the range [0, \\infty ] Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/lift_score/ mcnemar mcnemar(ary, corrected=True, exact=False) McNemar test for paired nominal data Parameters ary : array-like, shape=[2, 2] 2 x 2 contigency table (as returned by evaluate.mcnemar_table), where a: ary[0, 0]: # of samples that both models predicted correctly b: ary[0, 1]: # of samples that model 1 got right and model 2 got wrong c: ary[1, 0]: # of samples that model 2 got right and model 1 got wrong d: aryCell [1, 1]: # of samples that both models predicted incorrectly corrected : array-like, shape=[n_samples] (default: True) Uses Edward's continuity correction for chi-squared if True exact : bool, (default: False) If True , uses an exact binomial test comparing b to a binomial distribution with n = b + c and p = 0.5. It is highly recommended to use exact=True for sample sizes < 25 since chi-squared is not well-approximated by the chi-squared distribution! Returns chi2, p : float or None, float Returns the chi-squared value and the p-value; if exact=True (default: False ), chi2 is None Examples For usage examples, please see [http://rasbt.github.io/mlxtend/user_guide/evaluate/mcnemar/](http://rasbt.github.io/mlxtend/user_guide/evaluate/mcnemar/) mcnemar_table mcnemar_table(y_target, y_model1, y_model2) Compute a 2x2 contigency table for McNemar's test. Parameters y_target : array-like, shape=[n_samples] True class labels as 1D NumPy array. y_model1 : array-like, shape=[n_samples] Predicted class labels from model as 1D NumPy array. y_model2 : array-like, shape=[n_samples] Predicted class labels from model 2 as 1D NumPy array. Returns tb : array-like, shape=[2, 2] 2x2 contingency table with the following contents: a: tb[0, 0]: # of samples that both models predicted correctly b: tb[0, 1]: # of samples that model 1 got right and model 2 got wrong c: tb[1, 0]: # of samples that model 2 got right and model 1 got wrong d: tb[1, 1]: # of samples that both models predicted incorrectly Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/mcnemar_table/ mcnemar_tables mcnemar_tables(y_target, y_model_predictions)* Compute multiple 2x2 contigency tables for McNemar's test or Cochran's Q test. Parameters y_target : array-like, shape=[n_samples] True class labels as 1D NumPy array. y_model_predictions : array-like, shape=[n_samples] Predicted class labels for a model. Returns tables : dict Dictionary of NumPy arrays with shape=[2, 2]. Each dictionary key names the two models to be compared based on the order the models were passed as *y_model_predictions . The number of dictionary entries is equal to the number of pairwise combinations between the m models, i.e., \"m choose 2.\" For example the following target array (containing the true labels) and 3 models y_true = np.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1]) y_mod0 = np.array([0, 1, 0, 0, 0, 1, 1, 0, 0, 0]) y_mod0 = np.array([0, 0, 1, 1, 0, 1, 1, 0, 0, 0]) y_mod0 = np.array([0, 1, 1, 1, 0, 1, 0, 0, 0, 0]) would result in the following dictionary: {'model_0 vs model_1': array([[ 4., 1.], [ 2., 3.]]), 'model_0 vs model_2': array([[ 3., 0.], [ 3., 4.]]), 'model_1 vs model_2': array([[ 3., 0.], [ 2., 5.]])} Each array is structured in the following way: tb[0, 0]: # of samples that both models predicted correctly tb[0, 1]: # of samples that model a got right and model b got wrong tb[1, 0]: # of samples that model b got right and model a got wrong tb[1, 1]: # of samples that both models predicted incorrectly Examples For usage examples, please see [http://rasbt.github.io/mlxtend/user_guide/evaluate/mcnemar_tables/](http://rasbt.github.io/mlxtend/user_guide/evaluate/mcnemar_tables/) paired_ttest_5x2cv paired_ttest_5x2cv(estimator1, estimator2, X, y, scoring=None, random_seed=None) Implements the 5x2cv paired t test proposed by Dieterrich (1998) to compare the performance of two models. Parameters estimator1 : scikit-learn classifier or regressor estimator2 : scikit-learn classifier or regressor X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. scoring : str, callable, or None (default: None) If None (default), uses 'accuracy' for sklearn classifiers and 'r2' for sklearn regressors. If str, uses a sklearn scoring metric string identifier, for example {accuracy, f1, precision, recall, roc_auc} for classifiers, {'mean_absolute_error', 'mean_squared_error'/'neg_mean_squared_error', 'median_absolute_error', 'r2'} for regressors. If a callable object or function is provided, it has to be conform with sklearn's signature scorer(estimator, X, y) ; see http://scikit-learn.org/stable/modules/generated/sklearn.metrics.make_scorer.html for more information. random_seed : int or None (default: None) Random seed for creating the test/train splits. Returns t : float The t-statistic pvalue : float Two-tailed p-value. If the chosen significance level is larger than the p-value, we reject the null hypothesis and accept that there are significant differences in the two compared models. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/paired_ttest_5x2cv/ paired_ttest_kfold_cv paired_ttest_kfold_cv(estimator1, estimator2, X, y, cv=10, scoring=None, shuffle=False, random_seed=None) Implements the k-fold paired t test procedure to compare the performance of two models. Parameters estimator1 : scikit-learn classifier or regressor estimator2 : scikit-learn classifier or regressor X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. cv : int (default: 10) Number of splits and iteration for the cross-validation procedure scoring : str, callable, or None (default: None) If None (default), uses 'accuracy' for sklearn classifiers and 'r2' for sklearn regressors. If str, uses a sklearn scoring metric string identifier, for example {accuracy, f1, precision, recall, roc_auc} for classifiers, {'mean_absolute_error', 'mean_squared_error'/'neg_mean_squared_error', 'median_absolute_error', 'r2'} for regressors. If a callable object or function is provided, it has to be conform with sklearn's signature scorer(estimator, X, y) ; see http://scikit-learn.org/stable/modules/generated/sklearn.metrics.make_scorer.html for more information. shuffle : bool (default: True) Whether to shuffle the dataset for generating the k-fold splits. random_seed : int or None (default: None) Random seed for shuffling the dataset for generating the k-fold splits. Ignored if shuffle=False. Returns t : float The t-statistic pvalue : float Two-tailed p-value. If the chosen significance level is larger than the p-value, we reject the null hypothesis and accept that there are significant differences in the two compared models. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/paired_ttest_kfold_cv/ paired_ttest_resampled paired_ttest_resampled(estimator1, estimator2, X, y, num_rounds=30, test_size=0.3, scoring=None, random_seed=None) Implements the resampled paired t test procedure to compare the performance of two models (also called k-hold-out paired t test). Parameters estimator1 : scikit-learn classifier or regressor estimator2 : scikit-learn classifier or regressor X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. num_rounds : int (default: 30) Number of resampling iterations (i.e., train/test splits) test_size : float or int (default: 0.3) If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to use as a test set. If int, represents the absolute number of test exsamples. scoring : str, callable, or None (default: None) If None (default), uses 'accuracy' for sklearn classifiers and 'r2' for sklearn regressors. If str, uses a sklearn scoring metric string identifier, for example {accuracy, f1, precision, recall, roc_auc} for classifiers, {'mean_absolute_error', 'mean_squared_error'/'neg_mean_squared_error', 'median_absolute_error', 'r2'} for regressors. If a callable object or function is provided, it has to be conform with sklearn's signature scorer(estimator, X, y) ; see http://scikit-learn.org/stable/modules/generated/sklearn.metrics.make_scorer.html for more information. random_seed : int or None (default: None) Random seed for creating the test/train splits. Returns t : float The t-statistic pvalue : float Two-tailed p-value. If the chosen significance level is larger than the p-value, we reject the null hypothesis and accept that there are significant differences in the two compared models. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/paired_ttest_resampled/ permutation_test permutation_test(x, y, func='x_mean != y_mean', method='exact', num_rounds=1000, seed=None) Nonparametric permutation test Parameters x : list or numpy array with shape (n_datapoints,) A list or 1D numpy array of the first sample (e.g., the treatment group). y : list or numpy array with shape (n_datapoints,) A list or 1D numpy array of the second sample (e.g., the control group). func : custom function or str (default: 'x_mean != y_mean') function to compute the statistic for the permutation test. - If 'x_mean != y_mean', uses func=lambda x, y: np.abs(np.mean(x) - np.mean(y))) for a two-sided test. - If 'x_mean > y_mean', uses func=lambda x, y: np.mean(x) - np.mean(y)) for a one-sided test. - If 'x_mean < y_mean', uses func=lambda x, y: np.mean(y) - np.mean(x)) for a one-sided test. method : 'approximate' or 'exact' (default: 'exact') If 'exact' (default), all possible permutations are considered. If 'approximate' the number of drawn samples is given by num_rounds . Note that 'exact' is typically not feasible unless the dataset size is relatively small. num_rounds : int (default: 1000) The number of permutation samples if method='approximate' . seed : int or None (default: None) The random seed for generating permutation samples if method='approximate' . Returns p-value under the null hypothesis Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/permutation_test/ proportion_difference proportion_difference(proportion_1, proportion_2, n_1, n_2=None) Computes the test statistic and p-value for a difference of proportions test. Parameters proportion_1 : float The first proportion proportion_2 : float The second proportion n_1 : int The sample size of the first test sample n_2 : int or None (default=None) The sample size of the second test sample. If None , n_1 = n_2 . Returns z, p : float or None, float Returns the z-score and the p-value Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/proportion_difference/ scoring scoring(y_target, y_predicted, metric='error', positive_label=1, unique_labels='auto') Compute a scoring metric for supervised learning. Parameters y_target : array-like, shape=[n_values] True class labels or target values. y_predicted : array-like, shape=[n_values] Predicted class labels or target values. metric : str (default: 'error') Performance metric: 'accuracy': (TP + TN)/(FP + FN + TP + TN) = 1-ERR 'per-class accuracy': Average per-class accuracy 'per-class error': Average per-class error 'error': (TP + TN)/(FP+ FN + TP + TN) = 1-ACC 'false_positive_rate': FP/N = FP/(FP + TN) 'true_positive_rate': TP/P = TP/(FN + TP) 'true_negative_rate': TN/N = TN/(FP + TN) 'precision': TP/(TP + FP) 'recall': equal to 'true_positive_rate' 'sensitivity': equal to 'true_positive_rate' or 'recall' 'specificity': equal to 'true_negative_rate' 'f1': 2 * (PRE * REC)/(PRE + REC) 'matthews_corr_coef': (TP TN - FP FN) / (sqrt{(TP + FP)( TP + FN )( TN + FP )( TN + FN )}) Where: [TP: True positives, TN = True negatives, TN: True negatives, FN = False negatives] positive_label : int (default: 1) Label of the positive class for binary classification metrics. unique_labels : str or array-like (default: 'auto') If 'auto', deduces the unique class labels from y_target Returns score : float Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/scoring/","title":"Mlxtend.evaluate"},{"location":"api_subpackages/mlxtend.evaluate/#bootstrapoutofbag","text":"BootstrapOutOfBag(n_splits=200, random_seed=None) Parameters n_splits : int (default=200) Number of bootstrap iterations. Must be larger than 1. random_seed : int (default=None) If int, random_seed is the seed used by the random number generator. Returns train_idx : ndarray The training set indices for that split. test_idx : ndarray The testing set indices for that split. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/BootstrapOutOfBag/","title":"BootstrapOutOfBag"},{"location":"api_subpackages/mlxtend.evaluate/#methods","text":"get_n_splits(X=None, y=None, groups=None) Returns the number of splitting iterations in the cross-validator Parameters X : object Always ignored, exists for compatibility with scikit-learn. y : object Always ignored, exists for compatibility with scikit-learn. groups : object Always ignored, exists for compatibility with scikit-learn. Returns n_splits : int Returns the number of splitting iterations in the cross-validator. split(X, y=None, groups=None) y : array-like or None (default: None) Argument is not used and only included as parameter for compatibility, similar to KFold in scikit-learn. groups : array-like or None (default: None) Argument is not used and only included as parameter for compatibility, similar to KFold in scikit-learn.","title":"Methods"},{"location":"api_subpackages/mlxtend.evaluate/#predefinedholdoutsplit","text":"PredefinedHoldoutSplit(valid_indices) Train/Validation set splitter for sklearn's GridSearchCV etc. Uses user-specified train/validation set indices to split a dataset into train/validation sets using user-defined or random indices. Parameters valid_indices : array-like, shape (num_examples,) Indices of the training examples in the training set to be used for validation. All other indices in the training set are used to for a training subset for model fitting.","title":"PredefinedHoldoutSplit"},{"location":"api_subpackages/mlxtend.evaluate/#methods_1","text":"get_n_splits(X=None, y=None, groups=None) Returns the number of splitting iterations in the cross-validator Parameters X : object Always ignored, exists for compatibility. y : object Always ignored, exists for compatibility. groups : object Always ignored, exists for compatibility. Returns n_splits : 1 Returns the number of splitting iterations in the cross-validator. Always returns 1. split(X, y, groups=None) Generate indices to split data into training and test set. Parameters X : array-like, shape (num_examples, num_features) Training data, where num_examples is the number of examples and num_features is the number of features. y : array-like, shape (num_examples,) The target variable for supervised learning problems. Stratification is done based on the y labels. groups : object Always ignored, exists for compatibility. Yields train_index : ndarray The training set indices for that split. valid_index : ndarray The validation set indices for that split.","title":"Methods"},{"location":"api_subpackages/mlxtend.evaluate/#randomholdoutsplit","text":"RandomHoldoutSplit(valid_size=0.5, random_seed=None, stratify=False) Train/Validation set splitter for sklearn's GridSearchCV etc. Provides train/validation set indices to split a dataset into train/validation sets using random indices. Parameters valid_size : float (default: 0.5) Proportion of examples that being assigned as validation examples. 1- valid_size will then automatically be assigned as training set examples. random_seed : int (default: None) The random seed for splitting the data into training and validation set partitions. stratify : bool (default: False) True or False, whether to perform a stratified split or not","title":"RandomHoldoutSplit"},{"location":"api_subpackages/mlxtend.evaluate/#methods_2","text":"get_n_splits(X=None, y=None, groups=None) Returns the number of splitting iterations in the cross-validator Parameters X : object Always ignored, exists for compatibility. y : object Always ignored, exists for compatibility. groups : object Always ignored, exists for compatibility. Returns n_splits : 1 Returns the number of splitting iterations in the cross-validator. Always returns 1. split(X, y, groups=None) Generate indices to split data into training and test set. Parameters X : array-like, shape (num_examples, num_features) Training data, where num_examples is the number of training examples and num_features is the number of features. y : array-like, shape (num_examples,) The target variable for supervised learning problems. Stratification is done based on the y labels. groups : object Always ignored, exists for compatibility. Yields train_index : ndarray The training set indices for that split. valid_index : ndarray The validation set indices for that split.","title":"Methods"},{"location":"api_subpackages/mlxtend.evaluate/#bootstrap","text":"bootstrap(x, func, num_rounds=1000, ci=0.95, ddof=1, seed=None) Implements the ordinary nonparametric bootstrap Parameters x : NumPy array, shape=(n_samples, [n_columns]) An one or multidimensional array of data records func : A function which computes a statistic that is used to compute the bootstrap replicates (the statistic computed from the bootstrap samples). This function must return a scalar value. For example, np.mean or np.median would be an acceptable argument for func if x is a 1-dimensional array or vector. num_rounds : int (default=1000) The number of bootstrap samnples to draw where each bootstrap sample has the same number of records as the original dataset. ci : int (default=0.95) An integer in the range (0, 1) that represents the confidence level for computing the confidence interval. For example, ci=0.95 (default) will compute the 95% confidence interval from the bootstrap replicates. ddof : int The delta degrees of freedom used when computing the standard error. seed : int or None (default=None) Random seed for generating bootstrap samples. Returns original, standard_error, (lower_ci, upper_ci) : tuple Returns the statistic of the original sample ( original ), the standard error of the estimate, and the respective confidence interval bounds. Examples >>> from mlxtend.evaluate import bootstrap >>> rng = np.random.RandomState(123) >>> x = rng.normal(loc=5., size=100) >>> original, std_err, ci_bounds = bootstrap(x, ... num_rounds=1000, ... func=np.mean, ... ci=0.95, ... seed=123) >>> print('Mean: %.2f, SE: +/- %.2f, CI95: [%.2f, %.2f]' % (original, ... std_err, ... ci_bounds[0], ... ci_bounds[1])) Mean: 5.03, SE: +/- 0.11, CI95: [4.80, 5.26] >>> For more usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/bootstrap/","title":"bootstrap"},{"location":"api_subpackages/mlxtend.evaluate/#bootstrap_point632_score","text":"bootstrap_point632_score(estimator, X, y, n_splits=200, method='.632', scoring_func=None, random_seed=None, clone_estimator=True) Implementation of the .632 [1] and .632+ [2] bootstrap for supervised learning References: [1] Efron, Bradley. 1983. \"Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation.\" Journal of the American Statistical Association 78 (382): 316. doi:10.2307/2288636. [2] Efron, Bradley, and Robert Tibshirani. 1997. \"Improvements on Cross-Validation: The .632+ Bootstrap Method.\" Journal of the American Statistical Association 92 (438): 548. doi:10.2307/2965703. Parameters estimator : object An estimator for classification or regression that follows the scikit-learn API and implements \"fit\" and \"predict\" methods. X : array-like The data to fit. Can be, for example a list, or an array at least 2d. y : array-like, optional, default: None The target variable to try to predict in the case of supervised learning. n_splits : int (default=200) Number of bootstrap iterations. Must be larger than 1. method : str (default='.632') The bootstrap method, which can be either - 1) '.632' bootstrap (default) - 2) '.632+' bootstrap - 3) 'oob' (regular out-of-bag, no weighting) for comparison studies. scoring_func : callable, Score function (or loss function) with signature scoring_func(y, y_pred, **kwargs) . If none, uses classification accuracy if the estimator is a classifier and mean squared error if the estimator is a regressor. random_seed : int (default=None) If int, random_seed is the seed used by the random number generator. clone_estimator : bool (default=True) Clones the estimator if true, otherwise fits the original. Returns scores : array of float, shape=(len(list(n_splits)),) Array of scores of the estimator for each bootstrap replicate. Examples >>> from sklearn import datasets, linear_model >>> from mlxtend.evaluate import bootstrap_point632_score >>> iris = datasets.load_iris() >>> X = iris.data >>> y = iris.target >>> lr = linear_model.LogisticRegression() >>> scores = bootstrap_point632_score(lr, X, y) >>> acc = np.mean(scores) >>> print('Accuracy:', acc) 0.953023146884 >>> lower = np.percentile(scores, 2.5) >>> upper = np.percentile(scores, 97.5) >>> print('95%% Confidence interval: [%.2f, %.2f]' % (lower, upper)) 95% Confidence interval: [0.90, 0.98] For more usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/bootstrap_point632_score/","title":"bootstrap_point632_score"},{"location":"api_subpackages/mlxtend.evaluate/#cochrans_q","text":"cochrans_q(y_target, y_model_predictions)* Cochran's Q test to compare 2 or more models. Parameters y_target : array-like, shape=[n_samples] True class labels as 1D NumPy array. *y_model_predictions : array-likes, shape=[n_samples] Variable number of 2 or more arrays that contain the predicted class labels from models as 1D NumPy array. Returns q, p : float or None, float Returns the Q (chi-squared) value and the p-value Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/cochrans_q/","title":"cochrans_q"},{"location":"api_subpackages/mlxtend.evaluate/#combined_ftest_5x2cv","text":"combined_ftest_5x2cv(estimator1, estimator2, X, y, scoring=None, random_seed=None) Implements the 5x2cv combined F test proposed by Alpaydin 1999, to compare the performance of two models. Parameters estimator1 : scikit-learn classifier or regressor estimator2 : scikit-learn classifier or regressor X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. scoring : str, callable, or None (default: None) If None (default), uses 'accuracy' for sklearn classifiers and 'r2' for sklearn regressors. If str, uses a sklearn scoring metric string identifier, for example {accuracy, f1, precision, recall, roc_auc} for classifiers, {'mean_absolute_error', 'mean_squared_error'/'neg_mean_squared_error', 'median_absolute_error', 'r2'} for regressors. If a callable object or function is provided, it has to be conform with sklearn's signature scorer(estimator, X, y) ; see http://scikit-learn.org/stable/modules/generated/sklearn.metrics.make_scorer.html for more information. random_seed : int or None (default: None) Random seed for creating the test/train splits. Returns f : float The F-statistic pvalue : float Two-tailed p-value. If the chosen significance level is larger than the p-value, we reject the null hypothesis and accept that there are significant differences in the two compared models. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/combined_ftest_5x2cv/","title":"combined_ftest_5x2cv"},{"location":"api_subpackages/mlxtend.evaluate/#confusion_matrix","text":"confusion_matrix(y_target, y_predicted, binary=False, positive_label=1) Compute a confusion matrix/contingency table. Parameters y_target : array-like, shape=[n_samples] True class labels. y_predicted : array-like, shape=[n_samples] Predicted class labels. binary : bool (default: False) Maps a multi-class problem onto a binary confusion matrix, where the positive class is 1 and all other classes are 0. positive_label : int (default: 1) Class label of the positive class. Returns mat : array-like, shape=[n_classes, n_classes] Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/confusion_matrix/","title":"confusion_matrix"},{"location":"api_subpackages/mlxtend.evaluate/#feature_importance_permutation","text":"feature_importance_permutation(X, y, predict_method, metric, num_rounds=1, seed=None) Feature importance imputation via permutation importance Parameters X : NumPy array, shape = [n_samples, n_features] Dataset, where n_samples is the number of samples and n_features is the number of features. y : NumPy array, shape = [n_samples] Target values. predict_method : prediction function A callable function that predicts the target values from X. metric : str, callable The metric for evaluating the feature importance through permutation. By default, the strings 'accuracy' is recommended for classifiers and the string 'r2' is recommended for regressors. Optionally, a custom scoring function (e.g., metric=scoring_func ) that accepts two arguments, y_true and y_pred, which have similar shape to the y array. num_rounds : int (default=1) Number of rounds the feature columns are permuted to compute the permutation importance. seed : int or None (default=None) Random seed for permuting the feature columns. Returns mean_importance_vals, all_importance_vals : NumPy arrays. The first array, mean_importance_vals has shape [n_features, ] and contains the importance values for all features. The shape of the second array is [n_features, num_rounds] and contains the feature importance for each repetition. If num_rounds=1, it contains the same values as the first array, mean_importance_vals. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/feature_importance_permutation/","title":"feature_importance_permutation"},{"location":"api_subpackages/mlxtend.evaluate/#ftest","text":"ftest(y_target, y_model_predictions)* F-Test test to compare 2 or more models. Parameters y_target : array-like, shape=[n_samples] True class labels as 1D NumPy array. *y_model_predictions : array-likes, shape=[n_samples] Variable number of 2 or more arrays that contain the predicted class labels from models as 1D NumPy array. Returns f, p : float or None, float Returns the F-value and the p-value Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/ftest/","title":"ftest"},{"location":"api_subpackages/mlxtend.evaluate/#lift_score","text":"lift_score(y_target, y_predicted, binary=True, positive_label=1) Lift measures the degree to which the predictions of a classification model are better than randomly-generated predictions. The in terms of True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN), the lift score is computed as: [ TP / (TP+FP) ] / [ (TP+FN) / (TP+TN+FP+FN) ] Parameters y_target : array-like, shape=[n_samples] True class labels. y_predicted : array-like, shape=[n_samples] Predicted class labels. binary : bool (default: True) Maps a multi-class problem onto a binary, where the positive class is 1 and all other classes are 0. positive_label : int (default: 0) Class label of the positive class. Returns score : float Lift score in the range [0, \\infty ] Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/lift_score/","title":"lift_score"},{"location":"api_subpackages/mlxtend.evaluate/#mcnemar","text":"mcnemar(ary, corrected=True, exact=False) McNemar test for paired nominal data Parameters ary : array-like, shape=[2, 2] 2 x 2 contigency table (as returned by evaluate.mcnemar_table), where a: ary[0, 0]: # of samples that both models predicted correctly b: ary[0, 1]: # of samples that model 1 got right and model 2 got wrong c: ary[1, 0]: # of samples that model 2 got right and model 1 got wrong d: aryCell [1, 1]: # of samples that both models predicted incorrectly corrected : array-like, shape=[n_samples] (default: True) Uses Edward's continuity correction for chi-squared if True exact : bool, (default: False) If True , uses an exact binomial test comparing b to a binomial distribution with n = b + c and p = 0.5. It is highly recommended to use exact=True for sample sizes < 25 since chi-squared is not well-approximated by the chi-squared distribution! Returns chi2, p : float or None, float Returns the chi-squared value and the p-value; if exact=True (default: False ), chi2 is None Examples For usage examples, please see [http://rasbt.github.io/mlxtend/user_guide/evaluate/mcnemar/](http://rasbt.github.io/mlxtend/user_guide/evaluate/mcnemar/)","title":"mcnemar"},{"location":"api_subpackages/mlxtend.evaluate/#mcnemar_table","text":"mcnemar_table(y_target, y_model1, y_model2) Compute a 2x2 contigency table for McNemar's test. Parameters y_target : array-like, shape=[n_samples] True class labels as 1D NumPy array. y_model1 : array-like, shape=[n_samples] Predicted class labels from model as 1D NumPy array. y_model2 : array-like, shape=[n_samples] Predicted class labels from model 2 as 1D NumPy array. Returns tb : array-like, shape=[2, 2] 2x2 contingency table with the following contents: a: tb[0, 0]: # of samples that both models predicted correctly b: tb[0, 1]: # of samples that model 1 got right and model 2 got wrong c: tb[1, 0]: # of samples that model 2 got right and model 1 got wrong d: tb[1, 1]: # of samples that both models predicted incorrectly Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/mcnemar_table/","title":"mcnemar_table"},{"location":"api_subpackages/mlxtend.evaluate/#mcnemar_tables","text":"mcnemar_tables(y_target, y_model_predictions)* Compute multiple 2x2 contigency tables for McNemar's test or Cochran's Q test. Parameters y_target : array-like, shape=[n_samples] True class labels as 1D NumPy array. y_model_predictions : array-like, shape=[n_samples] Predicted class labels for a model. Returns tables : dict Dictionary of NumPy arrays with shape=[2, 2]. Each dictionary key names the two models to be compared based on the order the models were passed as *y_model_predictions . The number of dictionary entries is equal to the number of pairwise combinations between the m models, i.e., \"m choose 2.\" For example the following target array (containing the true labels) and 3 models y_true = np.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1]) y_mod0 = np.array([0, 1, 0, 0, 0, 1, 1, 0, 0, 0]) y_mod0 = np.array([0, 0, 1, 1, 0, 1, 1, 0, 0, 0]) y_mod0 = np.array([0, 1, 1, 1, 0, 1, 0, 0, 0, 0]) would result in the following dictionary: {'model_0 vs model_1': array([[ 4., 1.], [ 2., 3.]]), 'model_0 vs model_2': array([[ 3., 0.], [ 3., 4.]]), 'model_1 vs model_2': array([[ 3., 0.], [ 2., 5.]])} Each array is structured in the following way: tb[0, 0]: # of samples that both models predicted correctly tb[0, 1]: # of samples that model a got right and model b got wrong tb[1, 0]: # of samples that model b got right and model a got wrong tb[1, 1]: # of samples that both models predicted incorrectly Examples For usage examples, please see [http://rasbt.github.io/mlxtend/user_guide/evaluate/mcnemar_tables/](http://rasbt.github.io/mlxtend/user_guide/evaluate/mcnemar_tables/)","title":"mcnemar_tables"},{"location":"api_subpackages/mlxtend.evaluate/#paired_ttest_5x2cv","text":"paired_ttest_5x2cv(estimator1, estimator2, X, y, scoring=None, random_seed=None) Implements the 5x2cv paired t test proposed by Dieterrich (1998) to compare the performance of two models. Parameters estimator1 : scikit-learn classifier or regressor estimator2 : scikit-learn classifier or regressor X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. scoring : str, callable, or None (default: None) If None (default), uses 'accuracy' for sklearn classifiers and 'r2' for sklearn regressors. If str, uses a sklearn scoring metric string identifier, for example {accuracy, f1, precision, recall, roc_auc} for classifiers, {'mean_absolute_error', 'mean_squared_error'/'neg_mean_squared_error', 'median_absolute_error', 'r2'} for regressors. If a callable object or function is provided, it has to be conform with sklearn's signature scorer(estimator, X, y) ; see http://scikit-learn.org/stable/modules/generated/sklearn.metrics.make_scorer.html for more information. random_seed : int or None (default: None) Random seed for creating the test/train splits. Returns t : float The t-statistic pvalue : float Two-tailed p-value. If the chosen significance level is larger than the p-value, we reject the null hypothesis and accept that there are significant differences in the two compared models. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/paired_ttest_5x2cv/","title":"paired_ttest_5x2cv"},{"location":"api_subpackages/mlxtend.evaluate/#paired_ttest_kfold_cv","text":"paired_ttest_kfold_cv(estimator1, estimator2, X, y, cv=10, scoring=None, shuffle=False, random_seed=None) Implements the k-fold paired t test procedure to compare the performance of two models. Parameters estimator1 : scikit-learn classifier or regressor estimator2 : scikit-learn classifier or regressor X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. cv : int (default: 10) Number of splits and iteration for the cross-validation procedure scoring : str, callable, or None (default: None) If None (default), uses 'accuracy' for sklearn classifiers and 'r2' for sklearn regressors. If str, uses a sklearn scoring metric string identifier, for example {accuracy, f1, precision, recall, roc_auc} for classifiers, {'mean_absolute_error', 'mean_squared_error'/'neg_mean_squared_error', 'median_absolute_error', 'r2'} for regressors. If a callable object or function is provided, it has to be conform with sklearn's signature scorer(estimator, X, y) ; see http://scikit-learn.org/stable/modules/generated/sklearn.metrics.make_scorer.html for more information. shuffle : bool (default: True) Whether to shuffle the dataset for generating the k-fold splits. random_seed : int or None (default: None) Random seed for shuffling the dataset for generating the k-fold splits. Ignored if shuffle=False. Returns t : float The t-statistic pvalue : float Two-tailed p-value. If the chosen significance level is larger than the p-value, we reject the null hypothesis and accept that there are significant differences in the two compared models. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/paired_ttest_kfold_cv/","title":"paired_ttest_kfold_cv"},{"location":"api_subpackages/mlxtend.evaluate/#paired_ttest_resampled","text":"paired_ttest_resampled(estimator1, estimator2, X, y, num_rounds=30, test_size=0.3, scoring=None, random_seed=None) Implements the resampled paired t test procedure to compare the performance of two models (also called k-hold-out paired t test). Parameters estimator1 : scikit-learn classifier or regressor estimator2 : scikit-learn classifier or regressor X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. num_rounds : int (default: 30) Number of resampling iterations (i.e., train/test splits) test_size : float or int (default: 0.3) If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to use as a test set. If int, represents the absolute number of test exsamples. scoring : str, callable, or None (default: None) If None (default), uses 'accuracy' for sklearn classifiers and 'r2' for sklearn regressors. If str, uses a sklearn scoring metric string identifier, for example {accuracy, f1, precision, recall, roc_auc} for classifiers, {'mean_absolute_error', 'mean_squared_error'/'neg_mean_squared_error', 'median_absolute_error', 'r2'} for regressors. If a callable object or function is provided, it has to be conform with sklearn's signature scorer(estimator, X, y) ; see http://scikit-learn.org/stable/modules/generated/sklearn.metrics.make_scorer.html for more information. random_seed : int or None (default: None) Random seed for creating the test/train splits. Returns t : float The t-statistic pvalue : float Two-tailed p-value. If the chosen significance level is larger than the p-value, we reject the null hypothesis and accept that there are significant differences in the two compared models. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/paired_ttest_resampled/","title":"paired_ttest_resampled"},{"location":"api_subpackages/mlxtend.evaluate/#permutation_test","text":"permutation_test(x, y, func='x_mean != y_mean', method='exact', num_rounds=1000, seed=None) Nonparametric permutation test Parameters x : list or numpy array with shape (n_datapoints,) A list or 1D numpy array of the first sample (e.g., the treatment group). y : list or numpy array with shape (n_datapoints,) A list or 1D numpy array of the second sample (e.g., the control group). func : custom function or str (default: 'x_mean != y_mean') function to compute the statistic for the permutation test. - If 'x_mean != y_mean', uses func=lambda x, y: np.abs(np.mean(x) - np.mean(y))) for a two-sided test. - If 'x_mean > y_mean', uses func=lambda x, y: np.mean(x) - np.mean(y)) for a one-sided test. - If 'x_mean < y_mean', uses func=lambda x, y: np.mean(y) - np.mean(x)) for a one-sided test. method : 'approximate' or 'exact' (default: 'exact') If 'exact' (default), all possible permutations are considered. If 'approximate' the number of drawn samples is given by num_rounds . Note that 'exact' is typically not feasible unless the dataset size is relatively small. num_rounds : int (default: 1000) The number of permutation samples if method='approximate' . seed : int or None (default: None) The random seed for generating permutation samples if method='approximate' . Returns p-value under the null hypothesis Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/permutation_test/","title":"permutation_test"},{"location":"api_subpackages/mlxtend.evaluate/#proportion_difference","text":"proportion_difference(proportion_1, proportion_2, n_1, n_2=None) Computes the test statistic and p-value for a difference of proportions test. Parameters proportion_1 : float The first proportion proportion_2 : float The second proportion n_1 : int The sample size of the first test sample n_2 : int or None (default=None) The sample size of the second test sample. If None , n_1 = n_2 . Returns z, p : float or None, float Returns the z-score and the p-value Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/proportion_difference/","title":"proportion_difference"},{"location":"api_subpackages/mlxtend.evaluate/#scoring","text":"scoring(y_target, y_predicted, metric='error', positive_label=1, unique_labels='auto') Compute a scoring metric for supervised learning. Parameters y_target : array-like, shape=[n_values] True class labels or target values. y_predicted : array-like, shape=[n_values] Predicted class labels or target values. metric : str (default: 'error') Performance metric: 'accuracy': (TP + TN)/(FP + FN + TP + TN) = 1-ERR 'per-class accuracy': Average per-class accuracy 'per-class error': Average per-class error 'error': (TP + TN)/(FP+ FN + TP + TN) = 1-ACC 'false_positive_rate': FP/N = FP/(FP + TN) 'true_positive_rate': TP/P = TP/(FN + TP) 'true_negative_rate': TN/N = TN/(FP + TN) 'precision': TP/(TP + FP) 'recall': equal to 'true_positive_rate' 'sensitivity': equal to 'true_positive_rate' or 'recall' 'specificity': equal to 'true_negative_rate' 'f1': 2 * (PRE * REC)/(PRE + REC) 'matthews_corr_coef': (TP TN - FP FN) / (sqrt{(TP + FP)( TP + FN )( TN + FP )( TN + FN )}) Where: [TP: True positives, TN = True negatives, TN: True negatives, FN = False negatives] positive_label : int (default: 1) Label of the positive class for binary classification metrics. unique_labels : str or array-like (default: 'auto') If 'auto', deduces the unique class labels from y_target Returns score : float Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/scoring/","title":"scoring"},{"location":"api_subpackages/mlxtend.externals/","text":"mlxtend version: 0.14.0dev","title":"Mlxtend.externals"},{"location":"api_subpackages/mlxtend.feature_extraction/","text":"mlxtend version: 0.14.0dev LinearDiscriminantAnalysis LinearDiscriminantAnalysis(n_discriminants=None) Linear Discriminant Analysis Class Parameters n_discriminants : int (default: None) The number of discrimants for transformation. Keeps the original dimensions of the dataset if None . Attributes w_ : array-like, shape=[n_features, n_discriminants] Projection matrix e_vals_ : array-like, shape=[n_features] Eigenvalues in sorted order. e_vecs_ : array-like, shape=[n_features] Eigenvectors in sorted order. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/feature_extraction/LinearDiscriminantAnalysis/ Methods fit(X, y, n_classes=None) Fit the LDA model with X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. n_classes : int (default: None) A positive integer to declare the number of class labels if not all class labels are present in a partial training set. Gets the number of class labels automatically if None. Returns self : object get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values.' adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py Author: Gael Varoquaux gael.varoquaux@normalesup.org License: BSD 3 clause set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py Author: Gael Varoquaux gael.varoquaux@normalesup.org License: BSD 3 clause transform(X) Apply the linear transformation on X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns X_projected : np.ndarray, shape = [n_samples, n_discriminants] Projected training vectors. PrincipalComponentAnalysis PrincipalComponentAnalysis(n_components=None, solver='eigen') Principal Component Analysis Class Parameters n_components : int (default: None) The number of principal components for transformation. Keeps the original dimensions of the dataset if None . solver : str (default: 'eigen') Method for performing the matrix decomposition. {'eigen', 'svd'} Attributes w_ : array-like, shape=[n_features, n_components] Projection matrix e_vals_ : array-like, shape=[n_features] Eigenvalues in sorted order. e_vecs_ : array-like, shape=[n_features] Eigenvectors in sorted order. loadings_ : array_like, shape=[n_features, n_features] The factor loadings of the original variables onto the principal components. The columns are the principal components, and the rows are the features loadings. For instance, the first column contains the loadings onto the first principal component. Note that the signs may be flipped depending on whether you use the 'eigen' or 'svd' solver; this does not affect the interpretation of the loadings though. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/feature_extraction/PrincipalComponentAnalysis/ Methods fit(X) Learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns self : object get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values.' adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py Author: Gael Varoquaux gael.varoquaux@normalesup.org License: BSD 3 clause set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py Author: Gael Varoquaux gael.varoquaux@normalesup.org License: BSD 3 clause transform(X) Apply the linear transformation on X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns X_projected : np.ndarray, shape = [n_samples, n_components] Projected training vectors. RBFKernelPCA RBFKernelPCA(gamma=15.0, n_components=None, copy_X=True) RBF Kernel Principal Component Analysis for dimensionality reduction. Parameters gamma : float (default: 15.0) Free parameter (coefficient) of the RBF kernel. n_components : int (default: None) The number of principal components for transformation. Keeps the original dimensions of the dataset if None . copy_X : bool (default: True) Copies training data, which is required to compute the projection of new data via the transform method. Uses a reference to X if False. Attributes e_vals_ : array-like, shape=[n_features] Eigenvalues in sorted order. e_vecs_ : array-like, shape=[n_features] Eigenvectors in sorted order. X_projected_ : array-like, shape=[n_samples, n_components] Training samples projected along the component axes. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/feature_extraction/RBFKernelPCA/ Methods fit(X) Learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns self : object get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values.' adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py Author: Gael Varoquaux gael.varoquaux@normalesup.org License: BSD 3 clause set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py Author: Gael Varoquaux gael.varoquaux@normalesup.org License: BSD 3 clause transform(X) Apply the non-linear transformation on X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns X_projected : np.ndarray, shape = [n_samples, n_components] Projected training vectors.","title":"Mlxtend.feature extraction"},{"location":"api_subpackages/mlxtend.feature_extraction/#lineardiscriminantanalysis","text":"LinearDiscriminantAnalysis(n_discriminants=None) Linear Discriminant Analysis Class Parameters n_discriminants : int (default: None) The number of discrimants for transformation. Keeps the original dimensions of the dataset if None . Attributes w_ : array-like, shape=[n_features, n_discriminants] Projection matrix e_vals_ : array-like, shape=[n_features] Eigenvalues in sorted order. e_vecs_ : array-like, shape=[n_features] Eigenvectors in sorted order. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/feature_extraction/LinearDiscriminantAnalysis/","title":"LinearDiscriminantAnalysis"},{"location":"api_subpackages/mlxtend.feature_extraction/#methods","text":"fit(X, y, n_classes=None) Fit the LDA model with X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. n_classes : int (default: None) A positive integer to declare the number of class labels if not all class labels are present in a partial training set. Gets the number of class labels automatically if None. Returns self : object get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values.' adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py","title":"Methods"},{"location":"api_subpackages/mlxtend.feature_extraction/#author-gael-varoquaux-amp103amp97amp101amp108amp46amp118amp97amp114amp111amp113amp117amp97amp117amp120amp64amp110amp111amp114amp109amp97amp108amp101amp115amp117amp112amp46amp111amp114amp103","text":"","title":"Author: Gael Varoquaux gael.varoquaux@normalesup.org"},{"location":"api_subpackages/mlxtend.feature_extraction/#license-bsd-3-clause","text":"set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py","title":"License: BSD 3 clause"},{"location":"api_subpackages/mlxtend.feature_extraction/#author-gael-varoquaux-amp103amp97amp101amp108amp46amp118amp97amp114amp111amp113amp117amp97amp117amp120amp64amp110amp111amp114amp109amp97amp108amp101amp115amp117amp112amp46amp111amp114amp103_1","text":"","title":"Author: Gael Varoquaux gael.varoquaux@normalesup.org"},{"location":"api_subpackages/mlxtend.feature_extraction/#license-bsd-3-clause_1","text":"transform(X) Apply the linear transformation on X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns X_projected : np.ndarray, shape = [n_samples, n_discriminants] Projected training vectors.","title":"License: BSD 3 clause"},{"location":"api_subpackages/mlxtend.feature_extraction/#principalcomponentanalysis","text":"PrincipalComponentAnalysis(n_components=None, solver='eigen') Principal Component Analysis Class Parameters n_components : int (default: None) The number of principal components for transformation. Keeps the original dimensions of the dataset if None . solver : str (default: 'eigen') Method for performing the matrix decomposition. {'eigen', 'svd'} Attributes w_ : array-like, shape=[n_features, n_components] Projection matrix e_vals_ : array-like, shape=[n_features] Eigenvalues in sorted order. e_vecs_ : array-like, shape=[n_features] Eigenvectors in sorted order. loadings_ : array_like, shape=[n_features, n_features] The factor loadings of the original variables onto the principal components. The columns are the principal components, and the rows are the features loadings. For instance, the first column contains the loadings onto the first principal component. Note that the signs may be flipped depending on whether you use the 'eigen' or 'svd' solver; this does not affect the interpretation of the loadings though. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/feature_extraction/PrincipalComponentAnalysis/","title":"PrincipalComponentAnalysis"},{"location":"api_subpackages/mlxtend.feature_extraction/#methods_1","text":"fit(X) Learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns self : object get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values.' adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py","title":"Methods"},{"location":"api_subpackages/mlxtend.feature_extraction/#author-gael-varoquaux-amp103amp97amp101amp108amp46amp118amp97amp114amp111amp113amp117amp97amp117amp120amp64amp110amp111amp114amp109amp97amp108amp101amp115amp117amp112amp46amp111amp114amp103_2","text":"","title":"Author: Gael Varoquaux gael.varoquaux@normalesup.org"},{"location":"api_subpackages/mlxtend.feature_extraction/#license-bsd-3-clause_2","text":"set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py","title":"License: BSD 3 clause"},{"location":"api_subpackages/mlxtend.feature_extraction/#author-gael-varoquaux-amp103amp97amp101amp108amp46amp118amp97amp114amp111amp113amp117amp97amp117amp120amp64amp110amp111amp114amp109amp97amp108amp101amp115amp117amp112amp46amp111amp114amp103_3","text":"","title":"Author: Gael Varoquaux gael.varoquaux@normalesup.org"},{"location":"api_subpackages/mlxtend.feature_extraction/#license-bsd-3-clause_3","text":"transform(X) Apply the linear transformation on X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns X_projected : np.ndarray, shape = [n_samples, n_components] Projected training vectors.","title":"License: BSD 3 clause"},{"location":"api_subpackages/mlxtend.feature_extraction/#rbfkernelpca","text":"RBFKernelPCA(gamma=15.0, n_components=None, copy_X=True) RBF Kernel Principal Component Analysis for dimensionality reduction. Parameters gamma : float (default: 15.0) Free parameter (coefficient) of the RBF kernel. n_components : int (default: None) The number of principal components for transformation. Keeps the original dimensions of the dataset if None . copy_X : bool (default: True) Copies training data, which is required to compute the projection of new data via the transform method. Uses a reference to X if False. Attributes e_vals_ : array-like, shape=[n_features] Eigenvalues in sorted order. e_vecs_ : array-like, shape=[n_features] Eigenvectors in sorted order. X_projected_ : array-like, shape=[n_samples, n_components] Training samples projected along the component axes. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/feature_extraction/RBFKernelPCA/","title":"RBFKernelPCA"},{"location":"api_subpackages/mlxtend.feature_extraction/#methods_2","text":"fit(X) Learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns self : object get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values.' adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py","title":"Methods"},{"location":"api_subpackages/mlxtend.feature_extraction/#author-gael-varoquaux-amp103amp97amp101amp108amp46amp118amp97amp114amp111amp113amp117amp97amp117amp120amp64amp110amp111amp114amp109amp97amp108amp101amp115amp117amp112amp46amp111amp114amp103_4","text":"","title":"Author: Gael Varoquaux gael.varoquaux@normalesup.org"},{"location":"api_subpackages/mlxtend.feature_extraction/#license-bsd-3-clause_4","text":"set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py","title":"License: BSD 3 clause"},{"location":"api_subpackages/mlxtend.feature_extraction/#author-gael-varoquaux-amp103amp97amp101amp108amp46amp118amp97amp114amp111amp113amp117amp97amp117amp120amp64amp110amp111amp114amp109amp97amp108amp101amp115amp117amp112amp46amp111amp114amp103_5","text":"","title":"Author: Gael Varoquaux gael.varoquaux@normalesup.org"},{"location":"api_subpackages/mlxtend.feature_extraction/#license-bsd-3-clause_5","text":"transform(X) Apply the non-linear transformation on X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns X_projected : np.ndarray, shape = [n_samples, n_components] Projected training vectors.","title":"License: BSD 3 clause"},{"location":"api_subpackages/mlxtend.feature_selection/","text":"mlxtend version: 0.14.0dev ColumnSelector ColumnSelector(cols=None, drop_axis=False) Object for selecting specific columns from a data set. Parameters cols : array-like (default: None) A list specifying the feature indices to be selected. For example, [1, 4, 5] to select the 2nd, 5th, and 6th feature columns. If None, returns all columns in the array. drop_axis : bool (default=False) Drops last axis if True and the only one column is selected. This is useful, e.g., when the ColumnSelector is used for selecting only one column and the resulting array should be fed to e.g., a scikit-learn column selector. E.g., instead of returning an array with shape (n_samples, 1), drop_axis=True will return an aray with shape (n_samples,). Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/feature_selection/ColumnSelector/ Methods fit(X, y=None) Mock method. Does nothing. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] (default: None) Returns self fit_transform(X, y=None) Return a slice of the input array. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] (default: None) Returns X_slice : shape = [n_samples, k_features] Subset of the feature space where k_features <= n_features get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self transform(X, y=None) Return a slice of the input array. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] (default: None) Returns X_slice : shape = [n_samples, k_features] Subset of the feature space where k_features <= n_features ExhaustiveFeatureSelector ExhaustiveFeatureSelector(estimator, min_features=1, max_features=1, print_progress=True, scoring='accuracy', cv=5, n_jobs=1, pre_dispatch='2 n_jobs', clone_estimator=True)* Exhaustive Feature Selection for Classification and Regression. (new in v0.4.3) Parameters estimator : scikit-learn classifier or regressor min_features : int (default: 1) Minumum number of features to select max_features : int (default: 1) Maximum number of features to select print_progress : bool (default: True) Prints progress as the number of epochs to stderr. scoring : str, (default='accuracy') Scoring metric in {accuracy, f1, precision, recall, roc_auc} for classifiers, {'mean_absolute_error', 'mean_squared_error', 'median_absolute_error', 'r2'} for regressors, or a callable object or function with signature scorer(estimator, X, y) . cv : int (default: 5) Scikit-learn cross-validation generator or int . If estimator is a classifier (or y consists of integer class labels), stratified k-fold is performed, and regular k-fold cross-validation otherwise. No cross-validation if cv is None, False, or 0. n_jobs : int (default: 1) The number of CPUs to use for evaluating different feature subsets in parallel. -1 means 'all CPUs'. pre_dispatch : int, or string (default: '2*n_jobs') Controls the number of jobs that get dispatched during parallel execution if n_jobs > 1 or n_jobs=-1 . Reducing this number can be useful to avoid an explosion of memory consumption when more jobs get dispatched than CPUs can process. This parameter can be: None, in which case all the jobs are immediately created and spawned. Use this for lightweight and fast-running jobs, to avoid delays due to on-demand spawning of the jobs An int, giving the exact number of total jobs that are spawned A string, giving an expression as a function of n_jobs, as in 2*n_jobs clone_estimator : bool (default: True) Clones estimator if True; works with the original estimator instance if False. Set to False if the estimator doesn't implement scikit-learn's set_params and get_params methods. In addition, it is required to set cv=0, and n_jobs=1. Attributes best_idx_ : array-like, shape = [n_predictions] Feature Indices of the selected feature subsets. best_feature_names_ : array-like, shape = [n_predictions] Feature names of the selected feature subsets. If pandas DataFrames are used in the fit method, the feature names correspond to the column names. Otherwise, the feature names are string representation of the feature array indices. New in v 0.13.0. best_score_ : float Cross validation average score of the selected subset. subsets_ : dict A dictionary of selected feature subsets during the exhaustive selection, where the dictionary keys are the lengths k of these feature subsets. The dictionary values are dictionaries themselves with the following keys: 'feature_idx' (tuple of indices of the feature subset) 'feature_names' (tuple of feature names of the feat. subset) 'cv_scores' (list individual cross-validation scores) 'avg_score' (average cross-validation score) Note that if pandas DataFrames are used in the fit method, the 'feature_names' correspond to the column names. Otherwise, the feature names are string representation of the feature array indices. The 'feature_names' is new in v 0.13.0. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/feature_selection/ExhaustiveFeatureSelector/ Methods fit(X, y, custom_feature_names=None, fit_params) Perform feature selection and learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. New in v 0.13.0: pandas DataFrames are now also accepted as argument for X. y : array-like, shape = [n_samples] Target values. custom_feature_names : None or tuple (default: tuple) Custom feature names for self.k_feature_names and self.subsets_[i]['feature_names'] . (new in v 0.13.0) fit_params : dict of string -> object, optional Parameters to pass to to the fit method of classifier. Returns self : object fit_transform(X, y, fit_params) Fit to training data and return the best selected features from X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. New in v 0.13.0: pandas DataFrames are now also accepted as argument for X. y : array-like, shape = [n_samples] Target values. fit_params : dict of string -> object, optional Parameters to pass to to the fit method of classifier. Returns Feature subset of X, shape={n_samples, k_features} get_metric_dict(confidence_interval=0.95) Return metric dictionary Parameters confidence_interval : float (default: 0.95) A positive float between 0.0 and 1.0 to compute the confidence interval bounds of the CV score averages. Returns Dictionary with items where each dictionary value is a list with the number of iterations (number of feature subsets) as its length. The dictionary keys corresponding to these lists are as follows: 'feature_idx': tuple of the indices of the feature subset 'cv_scores': list with individual CV scores 'avg_score': of CV average scores 'std_dev': standard deviation of the CV score average 'std_err': standard error of the CV score average 'ci_bound': confidence interval bound of the CV score average get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self transform(X) Return the best selected features from X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. New in v 0.13.0: pandas DataFrames are now also accepted as argument for X. Returns Feature subset of X, shape={n_samples, k_features} SequentialFeatureSelector SequentialFeatureSelector(estimator, k_features=1, forward=True, floating=False, verbose=0, scoring=None, cv=5, n_jobs=1, pre_dispatch='2 n_jobs', clone_estimator=True)* Sequential Feature Selection for Classification and Regression. Parameters estimator : scikit-learn classifier or regressor k_features : int or tuple or str (default: 1) Number of features to select, where k_features < the full feature set. New in 0.4.2: A tuple containing a min and max value can be provided, and the SFS will consider return any feature combination between min and max that scored highest in cross-validtion. For example, the tuple (1, 4) will return any combination from 1 up to 4 features instead of a fixed number of features k. New in 0.8.0: A string argument \"best\" or \"parsimonious\". If \"best\" is provided, the feature selector will return the feature subset with the best cross-validation performance. If \"parsimonious\" is provided as an argument, the smallest feature subset that is within one standard error of the cross-validation performance will be selected. forward : bool (default: True) Forward selection if True, backward selection otherwise floating : bool (default: False) Adds a conditional exclusion/inclusion if True. verbose : int (default: 0), level of verbosity to use in logging. If 0, no output, if 1 number of features in current set, if 2 detailed logging i ncluding timestamp and cv scores at step. scoring : str, callable, or None (default: None) If None (default), uses 'accuracy' for sklearn classifiers and 'r2' for sklearn regressors. If str, uses a sklearn scoring metric string identifier, for example {accuracy, f1, precision, recall, roc_auc} for classifiers, {'mean_absolute_error', 'mean_squared_error'/'neg_mean_squared_error', 'median_absolute_error', 'r2'} for regressors. If a callable object or function is provided, it has to be conform with sklearn's signature scorer(estimator, X, y) ; see http://scikit-learn.org/stable/modules/generated/sklearn.metrics.make_scorer.html for more information. cv : int (default: 5) Integer or iterable yielding train, test splits. If cv is an integer and estimator is a classifier (or y consists of integer class labels) stratified k-fold. Otherwise regular k-fold cross-validation is performed. No cross-validation if cv is None, False, or 0. n_jobs : int (default: 1) The number of CPUs to use for evaluating different feature subsets in parallel. -1 means 'all CPUs'. pre_dispatch : int, or string (default: '2*n_jobs') Controls the number of jobs that get dispatched during parallel execution if n_jobs > 1 or n_jobs=-1 . Reducing this number can be useful to avoid an explosion of memory consumption when more jobs get dispatched than CPUs can process. This parameter can be: None, in which case all the jobs are immediately created and spawned. Use this for lightweight and fast-running jobs, to avoid delays due to on-demand spawning of the jobs An int, giving the exact number of total jobs that are spawned A string, giving an expression as a function of n_jobs, as in 2*n_jobs clone_estimator : bool (default: True) Clones estimator if True; works with the original estimator instance if False. Set to False if the estimator doesn't implement scikit-learn's set_params and get_params methods. In addition, it is required to set cv=0, and n_jobs=1. Attributes k_feature_idx_ : array-like, shape = [n_predictions] Feature Indices of the selected feature subsets. k_feature_names_ : array-like, shape = [n_predictions] Feature names of the selected feature subsets. If pandas DataFrames are used in the fit method, the feature names correspond to the column names. Otherwise, the feature names are string representation of the feature array indices. New in v 0.13.0. k_score_ : float Cross validation average score of the selected subset. subsets_ : dict A dictionary of selected feature subsets during the sequential selection, where the dictionary keys are the lengths k of these feature subsets. The dictionary values are dictionaries themselves with the following keys: 'feature_idx' (tuple of indices of the feature subset) 'feature_names' (tuple of feature names of the feat. subset) 'cv_scores' (list individual cross-validation scores) 'avg_score' (average cross-validation score) Note that if pandas DataFrames are used in the fit method, the 'feature_names' correspond to the column names. Otherwise, the feature names are string representation of the feature array indices. The 'feature_names' is new in v 0.13.0. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/feature_selection/SequentialFeatureSelector/ Methods fit(X, y, custom_feature_names=None, fit_params) Perform feature selection and learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. New in v 0.13.0: pandas DataFrames are now also accepted as argument for X. y : array-like, shape = [n_samples] Target values. New in v 0.13.0: pandas DataFrames are now also accepted as argument for y. custom_feature_names : None or tuple (default: tuple) Custom feature names for self.k_feature_names and self.subsets_[i]['feature_names'] . (new in v 0.13.0) fit_params : dict of string -> object, optional Parameters to pass to to the fit method of classifier. Returns self : object fit_transform(X, y, fit_params) Fit to training data then reduce X to its most important features. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. New in v 0.13.0: pandas DataFrames are now also accepted as argument for X. y : array-like, shape = [n_samples] Target values. New in v 0.13.0: a pandas Series are now also accepted as argument for y. fit_params : dict of string -> object, optional Parameters to pass to to the fit method of classifier. Returns Reduced feature subset of X, shape={n_samples, k_features} get_metric_dict(confidence_interval=0.95) Return metric dictionary Parameters confidence_interval : float (default: 0.95) A positive float between 0.0 and 1.0 to compute the confidence interval bounds of the CV score averages. Returns Dictionary with items where each dictionary value is a list with the number of iterations (number of feature subsets) as its length. The dictionary keys corresponding to these lists are as follows: 'feature_idx': tuple of the indices of the feature subset 'cv_scores': list with individual CV scores 'avg_score': of CV average scores 'std_dev': standard deviation of the CV score average 'std_err': standard error of the CV score average 'ci_bound': confidence interval bound of the CV score average get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self transform(X) Reduce X to its most important features. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. New in v 0.13.0: pandas DataFrames are now also accepted as argument for X. Returns Reduced feature subset of X, shape={n_samples, k_features}","title":"Mlxtend.feature selection"},{"location":"api_subpackages/mlxtend.feature_selection/#columnselector","text":"ColumnSelector(cols=None, drop_axis=False) Object for selecting specific columns from a data set. Parameters cols : array-like (default: None) A list specifying the feature indices to be selected. For example, [1, 4, 5] to select the 2nd, 5th, and 6th feature columns. If None, returns all columns in the array. drop_axis : bool (default=False) Drops last axis if True and the only one column is selected. This is useful, e.g., when the ColumnSelector is used for selecting only one column and the resulting array should be fed to e.g., a scikit-learn column selector. E.g., instead of returning an array with shape (n_samples, 1), drop_axis=True will return an aray with shape (n_samples,). Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/feature_selection/ColumnSelector/","title":"ColumnSelector"},{"location":"api_subpackages/mlxtend.feature_selection/#methods","text":"fit(X, y=None) Mock method. Does nothing. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] (default: None) Returns self fit_transform(X, y=None) Return a slice of the input array. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] (default: None) Returns X_slice : shape = [n_samples, k_features] Subset of the feature space where k_features <= n_features get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self transform(X, y=None) Return a slice of the input array. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] (default: None) Returns X_slice : shape = [n_samples, k_features] Subset of the feature space where k_features <= n_features","title":"Methods"},{"location":"api_subpackages/mlxtend.feature_selection/#exhaustivefeatureselector","text":"ExhaustiveFeatureSelector(estimator, min_features=1, max_features=1, print_progress=True, scoring='accuracy', cv=5, n_jobs=1, pre_dispatch='2 n_jobs', clone_estimator=True)* Exhaustive Feature Selection for Classification and Regression. (new in v0.4.3) Parameters estimator : scikit-learn classifier or regressor min_features : int (default: 1) Minumum number of features to select max_features : int (default: 1) Maximum number of features to select print_progress : bool (default: True) Prints progress as the number of epochs to stderr. scoring : str, (default='accuracy') Scoring metric in {accuracy, f1, precision, recall, roc_auc} for classifiers, {'mean_absolute_error', 'mean_squared_error', 'median_absolute_error', 'r2'} for regressors, or a callable object or function with signature scorer(estimator, X, y) . cv : int (default: 5) Scikit-learn cross-validation generator or int . If estimator is a classifier (or y consists of integer class labels), stratified k-fold is performed, and regular k-fold cross-validation otherwise. No cross-validation if cv is None, False, or 0. n_jobs : int (default: 1) The number of CPUs to use for evaluating different feature subsets in parallel. -1 means 'all CPUs'. pre_dispatch : int, or string (default: '2*n_jobs') Controls the number of jobs that get dispatched during parallel execution if n_jobs > 1 or n_jobs=-1 . Reducing this number can be useful to avoid an explosion of memory consumption when more jobs get dispatched than CPUs can process. This parameter can be: None, in which case all the jobs are immediately created and spawned. Use this for lightweight and fast-running jobs, to avoid delays due to on-demand spawning of the jobs An int, giving the exact number of total jobs that are spawned A string, giving an expression as a function of n_jobs, as in 2*n_jobs clone_estimator : bool (default: True) Clones estimator if True; works with the original estimator instance if False. Set to False if the estimator doesn't implement scikit-learn's set_params and get_params methods. In addition, it is required to set cv=0, and n_jobs=1. Attributes best_idx_ : array-like, shape = [n_predictions] Feature Indices of the selected feature subsets. best_feature_names_ : array-like, shape = [n_predictions] Feature names of the selected feature subsets. If pandas DataFrames are used in the fit method, the feature names correspond to the column names. Otherwise, the feature names are string representation of the feature array indices. New in v 0.13.0. best_score_ : float Cross validation average score of the selected subset. subsets_ : dict A dictionary of selected feature subsets during the exhaustive selection, where the dictionary keys are the lengths k of these feature subsets. The dictionary values are dictionaries themselves with the following keys: 'feature_idx' (tuple of indices of the feature subset) 'feature_names' (tuple of feature names of the feat. subset) 'cv_scores' (list individual cross-validation scores) 'avg_score' (average cross-validation score) Note that if pandas DataFrames are used in the fit method, the 'feature_names' correspond to the column names. Otherwise, the feature names are string representation of the feature array indices. The 'feature_names' is new in v 0.13.0. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/feature_selection/ExhaustiveFeatureSelector/","title":"ExhaustiveFeatureSelector"},{"location":"api_subpackages/mlxtend.feature_selection/#methods_1","text":"fit(X, y, custom_feature_names=None, fit_params) Perform feature selection and learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. New in v 0.13.0: pandas DataFrames are now also accepted as argument for X. y : array-like, shape = [n_samples] Target values. custom_feature_names : None or tuple (default: tuple) Custom feature names for self.k_feature_names and self.subsets_[i]['feature_names'] . (new in v 0.13.0) fit_params : dict of string -> object, optional Parameters to pass to to the fit method of classifier. Returns self : object fit_transform(X, y, fit_params) Fit to training data and return the best selected features from X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. New in v 0.13.0: pandas DataFrames are now also accepted as argument for X. y : array-like, shape = [n_samples] Target values. fit_params : dict of string -> object, optional Parameters to pass to to the fit method of classifier. Returns Feature subset of X, shape={n_samples, k_features} get_metric_dict(confidence_interval=0.95) Return metric dictionary Parameters confidence_interval : float (default: 0.95) A positive float between 0.0 and 1.0 to compute the confidence interval bounds of the CV score averages. Returns Dictionary with items where each dictionary value is a list with the number of iterations (number of feature subsets) as its length. The dictionary keys corresponding to these lists are as follows: 'feature_idx': tuple of the indices of the feature subset 'cv_scores': list with individual CV scores 'avg_score': of CV average scores 'std_dev': standard deviation of the CV score average 'std_err': standard error of the CV score average 'ci_bound': confidence interval bound of the CV score average get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self transform(X) Return the best selected features from X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. New in v 0.13.0: pandas DataFrames are now also accepted as argument for X. Returns Feature subset of X, shape={n_samples, k_features}","title":"Methods"},{"location":"api_subpackages/mlxtend.feature_selection/#sequentialfeatureselector","text":"SequentialFeatureSelector(estimator, k_features=1, forward=True, floating=False, verbose=0, scoring=None, cv=5, n_jobs=1, pre_dispatch='2 n_jobs', clone_estimator=True)* Sequential Feature Selection for Classification and Regression. Parameters estimator : scikit-learn classifier or regressor k_features : int or tuple or str (default: 1) Number of features to select, where k_features < the full feature set. New in 0.4.2: A tuple containing a min and max value can be provided, and the SFS will consider return any feature combination between min and max that scored highest in cross-validtion. For example, the tuple (1, 4) will return any combination from 1 up to 4 features instead of a fixed number of features k. New in 0.8.0: A string argument \"best\" or \"parsimonious\". If \"best\" is provided, the feature selector will return the feature subset with the best cross-validation performance. If \"parsimonious\" is provided as an argument, the smallest feature subset that is within one standard error of the cross-validation performance will be selected. forward : bool (default: True) Forward selection if True, backward selection otherwise floating : bool (default: False) Adds a conditional exclusion/inclusion if True. verbose : int (default: 0), level of verbosity to use in logging. If 0, no output, if 1 number of features in current set, if 2 detailed logging i ncluding timestamp and cv scores at step. scoring : str, callable, or None (default: None) If None (default), uses 'accuracy' for sklearn classifiers and 'r2' for sklearn regressors. If str, uses a sklearn scoring metric string identifier, for example {accuracy, f1, precision, recall, roc_auc} for classifiers, {'mean_absolute_error', 'mean_squared_error'/'neg_mean_squared_error', 'median_absolute_error', 'r2'} for regressors. If a callable object or function is provided, it has to be conform with sklearn's signature scorer(estimator, X, y) ; see http://scikit-learn.org/stable/modules/generated/sklearn.metrics.make_scorer.html for more information. cv : int (default: 5) Integer or iterable yielding train, test splits. If cv is an integer and estimator is a classifier (or y consists of integer class labels) stratified k-fold. Otherwise regular k-fold cross-validation is performed. No cross-validation if cv is None, False, or 0. n_jobs : int (default: 1) The number of CPUs to use for evaluating different feature subsets in parallel. -1 means 'all CPUs'. pre_dispatch : int, or string (default: '2*n_jobs') Controls the number of jobs that get dispatched during parallel execution if n_jobs > 1 or n_jobs=-1 . Reducing this number can be useful to avoid an explosion of memory consumption when more jobs get dispatched than CPUs can process. This parameter can be: None, in which case all the jobs are immediately created and spawned. Use this for lightweight and fast-running jobs, to avoid delays due to on-demand spawning of the jobs An int, giving the exact number of total jobs that are spawned A string, giving an expression as a function of n_jobs, as in 2*n_jobs clone_estimator : bool (default: True) Clones estimator if True; works with the original estimator instance if False. Set to False if the estimator doesn't implement scikit-learn's set_params and get_params methods. In addition, it is required to set cv=0, and n_jobs=1. Attributes k_feature_idx_ : array-like, shape = [n_predictions] Feature Indices of the selected feature subsets. k_feature_names_ : array-like, shape = [n_predictions] Feature names of the selected feature subsets. If pandas DataFrames are used in the fit method, the feature names correspond to the column names. Otherwise, the feature names are string representation of the feature array indices. New in v 0.13.0. k_score_ : float Cross validation average score of the selected subset. subsets_ : dict A dictionary of selected feature subsets during the sequential selection, where the dictionary keys are the lengths k of these feature subsets. The dictionary values are dictionaries themselves with the following keys: 'feature_idx' (tuple of indices of the feature subset) 'feature_names' (tuple of feature names of the feat. subset) 'cv_scores' (list individual cross-validation scores) 'avg_score' (average cross-validation score) Note that if pandas DataFrames are used in the fit method, the 'feature_names' correspond to the column names. Otherwise, the feature names are string representation of the feature array indices. The 'feature_names' is new in v 0.13.0. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/feature_selection/SequentialFeatureSelector/","title":"SequentialFeatureSelector"},{"location":"api_subpackages/mlxtend.feature_selection/#methods_2","text":"fit(X, y, custom_feature_names=None, fit_params) Perform feature selection and learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. New in v 0.13.0: pandas DataFrames are now also accepted as argument for X. y : array-like, shape = [n_samples] Target values. New in v 0.13.0: pandas DataFrames are now also accepted as argument for y. custom_feature_names : None or tuple (default: tuple) Custom feature names for self.k_feature_names and self.subsets_[i]['feature_names'] . (new in v 0.13.0) fit_params : dict of string -> object, optional Parameters to pass to to the fit method of classifier. Returns self : object fit_transform(X, y, fit_params) Fit to training data then reduce X to its most important features. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. New in v 0.13.0: pandas DataFrames are now also accepted as argument for X. y : array-like, shape = [n_samples] Target values. New in v 0.13.0: a pandas Series are now also accepted as argument for y. fit_params : dict of string -> object, optional Parameters to pass to to the fit method of classifier. Returns Reduced feature subset of X, shape={n_samples, k_features} get_metric_dict(confidence_interval=0.95) Return metric dictionary Parameters confidence_interval : float (default: 0.95) A positive float between 0.0 and 1.0 to compute the confidence interval bounds of the CV score averages. Returns Dictionary with items where each dictionary value is a list with the number of iterations (number of feature subsets) as its length. The dictionary keys corresponding to these lists are as follows: 'feature_idx': tuple of the indices of the feature subset 'cv_scores': list with individual CV scores 'avg_score': of CV average scores 'std_dev': standard deviation of the CV score average 'std_err': standard error of the CV score average 'ci_bound': confidence interval bound of the CV score average get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self transform(X) Reduce X to its most important features. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. New in v 0.13.0: pandas DataFrames are now also accepted as argument for X. Returns Reduced feature subset of X, shape={n_samples, k_features}","title":"Methods"},{"location":"api_subpackages/mlxtend.file_io/","text":"mlxtend version: 0.14.0dev find_filegroups find_filegroups(paths, substring='', extensions=None, validity_check=True, ignore_invisible=True, rstrip='', ignore_substring=None) Find and collect files from different directories in a python dictionary. Parameters paths : list Paths of the directories to be searched. Dictionary keys are build from the first directory. substring : str (default: '') Substring that all files have to contain to be considered. extensions : list (default: None) None or list of allowed file extensions for each path. If provided, the number of extensions must match the number of paths . validity_check : bool (default: None) If True , checks if all dictionary values have the same number of file paths. Prints a warning and returns an empty dictionary if the validity check failed. ignore_invisible : bool (default: True) If True , ignores invisible files (i.e., files starting with a period). rstrip : str (default: '') If provided, strips characters from right side of the file base names after splitting the extension. Useful to trim different filenames to a common stem. E.g,. \"abc_d.txt\" and \"abc_d_.csv\" would share the stem \"abc_d\" if rstrip is set to \"_\". ignore_substring : str (default: None) Ignores files that contain the specified substring. Returns groups : dict Dictionary of files paths. Keys are the file names found in the first directory listed in paths (without file extension). Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/file_io/find_filegroups/ find_files find_files(substring, path, recursive=False, check_ext=None, ignore_invisible=True, ignore_substring=None) Find files in a directory based on substring matching. Parameters substring : str Substring of the file to be matched. path : str Path where to look. recursive : bool If true, searches subdirectories recursively. check_ext : str If string (e.g., '.txt'), only returns files that match the specified file extension. ignore_invisible : bool If True , ignores invisible files (i.e., files starting with a period). ignore_substring : str Ignores files that contain the specified substring. Returns results : list List of the matched files. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/file_io/find_files/","title":"Mlxtend.file io"},{"location":"api_subpackages/mlxtend.file_io/#find_filegroups","text":"find_filegroups(paths, substring='', extensions=None, validity_check=True, ignore_invisible=True, rstrip='', ignore_substring=None) Find and collect files from different directories in a python dictionary. Parameters paths : list Paths of the directories to be searched. Dictionary keys are build from the first directory. substring : str (default: '') Substring that all files have to contain to be considered. extensions : list (default: None) None or list of allowed file extensions for each path. If provided, the number of extensions must match the number of paths . validity_check : bool (default: None) If True , checks if all dictionary values have the same number of file paths. Prints a warning and returns an empty dictionary if the validity check failed. ignore_invisible : bool (default: True) If True , ignores invisible files (i.e., files starting with a period). rstrip : str (default: '') If provided, strips characters from right side of the file base names after splitting the extension. Useful to trim different filenames to a common stem. E.g,. \"abc_d.txt\" and \"abc_d_.csv\" would share the stem \"abc_d\" if rstrip is set to \"_\". ignore_substring : str (default: None) Ignores files that contain the specified substring. Returns groups : dict Dictionary of files paths. Keys are the file names found in the first directory listed in paths (without file extension). Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/file_io/find_filegroups/","title":"find_filegroups"},{"location":"api_subpackages/mlxtend.file_io/#find_files","text":"find_files(substring, path, recursive=False, check_ext=None, ignore_invisible=True, ignore_substring=None) Find files in a directory based on substring matching. Parameters substring : str Substring of the file to be matched. path : str Path where to look. recursive : bool If true, searches subdirectories recursively. check_ext : str If string (e.g., '.txt'), only returns files that match the specified file extension. ignore_invisible : bool If True , ignores invisible files (i.e., files starting with a period). ignore_substring : str Ignores files that contain the specified substring. Returns results : list List of the matched files. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/file_io/find_files/","title":"find_files"},{"location":"api_subpackages/mlxtend.frequent_patterns/","text":"mlxtend version: 0.14.0dev apriori apriori(df, min_support=0.5, use_colnames=False, max_len=None, n_jobs=1) Get frequent itemsets from a one-hot DataFrame Parameters df : pandas DataFrame or pandas SparseDataFrame pandas DataFrame the encoded format. The allowed values are either 0/1 or True/False. For example, Apple Bananas Beer Chicken Milk Rice 0 1 0 1 1 0 1 1 1 0 1 0 0 1 2 1 0 1 0 0 0 3 1 1 0 0 0 0 4 0 0 1 1 1 1 5 0 0 1 0 1 1 6 0 0 1 0 1 0 7 1 1 0 0 0 0 min_support : float (default: 0.5) A float between 0 and 1 for minumum support of the itemsets returned. The support is computed as the fraction transactions_where_item(s)_occur / total_transactions. use_colnames : bool (default: False) If true, uses the DataFrames' column names in the returned DataFrame instead of column indices. max_len : int (default: None) Maximum length of the itemsets generated. If None (default) all possible itemsets lengths (under the apriori condition) are evaluated. Returns pandas DataFrame with columns ['support', 'itemsets'] of all itemsets that are >= min_support and < than max_len (if max_len is not None). Each itemset in the 'itemsets' column is of type frozenset , which is a Python built-in type that behaves similarly to sets except that it is immutable (For more info, see https://docs.python.org/3.6/library/stdtypes.html#frozenset). Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/frequent_patterns/apriori/ association_rules association_rules(df, metric='confidence', min_threshold=0.8, support_only=False) Generates a DataFrame of association rules including the metrics 'score', 'confidence', and 'lift' Parameters df : pandas DataFrame pandas DataFrame of frequent itemsets with columns ['support', 'itemsets'] metric : string (default: 'confidence') Metric to evaluate if a rule is of interest. Automatically set to 'support' if support_only=True . Otherwise, supported metrics are 'support', 'confidence', 'lift', 'leverage', and 'conviction' These metrics are computed as follows: - support(A->C) = support(A+C) [aka 'support'], range: [0, 1] - confidence(A->C) = support(A+C) / support(A), range: [0, 1] - lift(A->C) = confidence(A->C) / support(C), range: [0, inf] - leverage(A->C) = support(A->C) - support(A)*support(C), range: [-1, 1] - conviction = [1 - support(C)] / [1 - confidence(A->C)], range: [0, inf] min_threshold : float (default: 0.8) Minimal threshold for the evaluation metric, via the metric parameter, to decide whether a candidate rule is of interest. support_only : bool (default: False) Only computes the rule support and fills the other metric columns with NaNs. This is useful if: a) the input DataFrame is incomplete, e.g., does not contain support values for all rule antecedents and consequents b) you simply want to speed up the computation because you don't need the other metrics. Returns pandas DataFrame with columns \"antecedents\" and \"consequents\" that store itemsets, plus the scoring metric columns: \"antecedent support\", \"consequent support\", \"support\", \"confidence\", \"lift\", \"leverage\", \"conviction\" of all rules for which metric(rule) >= min_threshold. Each entry in the \"antecedents\" and \"consequents\" columns are of type frozenset , which is a Python built-in type that behaves similarly to sets except that it is immutable (For more info, see https://docs.python.org/3.6/library/stdtypes.html#frozenset). Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/frequent_patterns/association_rules/","title":"Mlxtend.frequent patterns"},{"location":"api_subpackages/mlxtend.frequent_patterns/#apriori","text":"apriori(df, min_support=0.5, use_colnames=False, max_len=None, n_jobs=1) Get frequent itemsets from a one-hot DataFrame Parameters df : pandas DataFrame or pandas SparseDataFrame pandas DataFrame the encoded format. The allowed values are either 0/1 or True/False. For example, Apple Bananas Beer Chicken Milk Rice 0 1 0 1 1 0 1 1 1 0 1 0 0 1 2 1 0 1 0 0 0 3 1 1 0 0 0 0 4 0 0 1 1 1 1 5 0 0 1 0 1 1 6 0 0 1 0 1 0 7 1 1 0 0 0 0 min_support : float (default: 0.5) A float between 0 and 1 for minumum support of the itemsets returned. The support is computed as the fraction transactions_where_item(s)_occur / total_transactions. use_colnames : bool (default: False) If true, uses the DataFrames' column names in the returned DataFrame instead of column indices. max_len : int (default: None) Maximum length of the itemsets generated. If None (default) all possible itemsets lengths (under the apriori condition) are evaluated. Returns pandas DataFrame with columns ['support', 'itemsets'] of all itemsets that are >= min_support and < than max_len (if max_len is not None). Each itemset in the 'itemsets' column is of type frozenset , which is a Python built-in type that behaves similarly to sets except that it is immutable (For more info, see https://docs.python.org/3.6/library/stdtypes.html#frozenset). Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/frequent_patterns/apriori/","title":"apriori"},{"location":"api_subpackages/mlxtend.frequent_patterns/#association_rules","text":"association_rules(df, metric='confidence', min_threshold=0.8, support_only=False) Generates a DataFrame of association rules including the metrics 'score', 'confidence', and 'lift' Parameters df : pandas DataFrame pandas DataFrame of frequent itemsets with columns ['support', 'itemsets'] metric : string (default: 'confidence') Metric to evaluate if a rule is of interest. Automatically set to 'support' if support_only=True . Otherwise, supported metrics are 'support', 'confidence', 'lift', 'leverage', and 'conviction' These metrics are computed as follows: - support(A->C) = support(A+C) [aka 'support'], range: [0, 1] - confidence(A->C) = support(A+C) / support(A), range: [0, 1] - lift(A->C) = confidence(A->C) / support(C), range: [0, inf] - leverage(A->C) = support(A->C) - support(A)*support(C), range: [-1, 1] - conviction = [1 - support(C)] / [1 - confidence(A->C)], range: [0, inf] min_threshold : float (default: 0.8) Minimal threshold for the evaluation metric, via the metric parameter, to decide whether a candidate rule is of interest. support_only : bool (default: False) Only computes the rule support and fills the other metric columns with NaNs. This is useful if: a) the input DataFrame is incomplete, e.g., does not contain support values for all rule antecedents and consequents b) you simply want to speed up the computation because you don't need the other metrics. Returns pandas DataFrame with columns \"antecedents\" and \"consequents\" that store itemsets, plus the scoring metric columns: \"antecedent support\", \"consequent support\", \"support\", \"confidence\", \"lift\", \"leverage\", \"conviction\" of all rules for which metric(rule) >= min_threshold. Each entry in the \"antecedents\" and \"consequents\" columns are of type frozenset , which is a Python built-in type that behaves similarly to sets except that it is immutable (For more info, see https://docs.python.org/3.6/library/stdtypes.html#frozenset). Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/frequent_patterns/association_rules/","title":"association_rules"},{"location":"api_subpackages/mlxtend.image/","text":"mlxtend version: 0.14.0dev extract_face_landmarks extract_face_landmarks(img, return_dtype= ) Function to extract face landmarks. Note that this function requires an installation of the Python version of the library \"dlib\": http://dlib.net Parameters img : array, shape = [h, w, ?] numpy array of a face image. Supported shapes are - 3D tensors with 1 or more color channels, for example, RGB: [h, w, 3] - 2D tensors without color channel, for example, Grayscale: [h, w] return_dtype: the return data-type of the array, default: np.int32. Returns landmarks : numpy.ndarray, shape = [68, 2] A numpy array, where each row contains a landmark/point x-y coordinates. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/sources/image/extract_face_landmarks.ipynb","title":"Mlxtend.image"},{"location":"api_subpackages/mlxtend.image/#extract_face_landmarks","text":"extract_face_landmarks(img, return_dtype= ) Function to extract face landmarks. Note that this function requires an installation of the Python version of the library \"dlib\": http://dlib.net Parameters img : array, shape = [h, w, ?] numpy array of a face image. Supported shapes are - 3D tensors with 1 or more color channels, for example, RGB: [h, w, 3] - 2D tensors without color channel, for example, Grayscale: [h, w] return_dtype: the return data-type of the array, default: np.int32. Returns landmarks : numpy.ndarray, shape = [68, 2] A numpy array, where each row contains a landmark/point x-y coordinates. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/sources/image/extract_face_landmarks.ipynb","title":"extract_face_landmarks"},{"location":"api_subpackages/mlxtend.math/","text":"mlxtend version: 0.14.0dev factorial factorial(n) None num_combinations num_combinations(n, k, with_replacement=False) Function to calculate the number of possible combinations. Parameters n : int Total number of items. k : int Number of elements of the target itemset. with_replacement : bool (default: False) Allows repeated elements if True. Returns comb : int Number of possible combinations. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/math/num_combinations/ num_permutations num_permutations(n, k, with_replacement=False) Function to calculate the number of possible permutations. Parameters n : int Total number of items. k : int Number of elements of the target itemset. with_replacement : bool Allows repeated elements if True. Returns permut : int Number of possible permutations. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/math/num_permutations/ vectorspace_dimensionality vectorspace_dimensionality(ary) Computes the hyper-volume spanned by a vector set Parameters ary : array-like, shape=[num_vectors, num_vectors] An orthogonal set of vectors (arranged as columns in a matrix) Returns dimensions : int An integer indicating the \"dimensionality\" hyper-volume spanned by the vector set vectorspace_orthonormalization vectorspace_orthonormalization(ary, eps=1e-13) Transforms a set of column vectors to a orthonormal basis. Given a set of orthogonal vectors, this functions converts such column vectors, arranged in a matrix, into orthonormal basis vectors. Parameters ary : array-like, shape=[num_vectors, num_vectors] An orthogonal set of vectors (arranged as columns in a matrix) eps : float (default: 1e-13) A small tolerance value to determine whether the vector norm is zero or not. Returns arr : array-like, shape=[num_vectors, num_vectors] An orthonormal set of vectors (arranged as columns)","title":"Mlxtend.math"},{"location":"api_subpackages/mlxtend.math/#factorial","text":"factorial(n) None","title":"factorial"},{"location":"api_subpackages/mlxtend.math/#num_combinations","text":"num_combinations(n, k, with_replacement=False) Function to calculate the number of possible combinations. Parameters n : int Total number of items. k : int Number of elements of the target itemset. with_replacement : bool (default: False) Allows repeated elements if True. Returns comb : int Number of possible combinations. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/math/num_combinations/","title":"num_combinations"},{"location":"api_subpackages/mlxtend.math/#num_permutations","text":"num_permutations(n, k, with_replacement=False) Function to calculate the number of possible permutations. Parameters n : int Total number of items. k : int Number of elements of the target itemset. with_replacement : bool Allows repeated elements if True. Returns permut : int Number of possible permutations. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/math/num_permutations/","title":"num_permutations"},{"location":"api_subpackages/mlxtend.math/#vectorspace_dimensionality","text":"vectorspace_dimensionality(ary) Computes the hyper-volume spanned by a vector set Parameters ary : array-like, shape=[num_vectors, num_vectors] An orthogonal set of vectors (arranged as columns in a matrix) Returns dimensions : int An integer indicating the \"dimensionality\" hyper-volume spanned by the vector set","title":"vectorspace_dimensionality"},{"location":"api_subpackages/mlxtend.math/#vectorspace_orthonormalization","text":"vectorspace_orthonormalization(ary, eps=1e-13) Transforms a set of column vectors to a orthonormal basis. Given a set of orthogonal vectors, this functions converts such column vectors, arranged in a matrix, into orthonormal basis vectors. Parameters ary : array-like, shape=[num_vectors, num_vectors] An orthogonal set of vectors (arranged as columns in a matrix) eps : float (default: 1e-13) A small tolerance value to determine whether the vector norm is zero or not. Returns arr : array-like, shape=[num_vectors, num_vectors] An orthonormal set of vectors (arranged as columns)","title":"vectorspace_orthonormalization"},{"location":"api_subpackages/mlxtend.plotting/","text":"mlxtend version: 0.14.0dev category_scatter category_scatter(x, y, label_col, data, markers='sxo^v', colors=('blue', 'green', 'red', 'purple', 'gray', 'cyan'), alpha=0.7, markersize=20.0, legend_loc='best') Scatter plot to plot categories in different colors/markerstyles. Parameters x : str or int DataFrame column name of the x-axis values or integer for the numpy ndarray column index. y : str DataFrame column name of the y-axis values or integer for the numpy ndarray column index data : Pandas DataFrame object or NumPy ndarray. markers : str Markers that are cycled through the label category. colors : tuple Colors that are cycled through the label category. alpha : float (default: 0.7) Parameter to control the transparency. markersize : float (default` : 20.0) Parameter to control the marker size. legend_loc : str (default: 'best') Location of the plot legend {best, upper left, upper right, lower left, lower right} No legend if legend_loc=False Returns fig : matplotlig.pyplot figure object Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/plotting/category_scatter/ checkerboard_plot checkerboard_plot(ary, cell_colors=('white', 'black'), font_colors=('black', 'white'), fmt='%.1f', figsize=None, row_labels=None, col_labels=None, fontsize=None) Plot a checkerboard table / heatmap via matplotlib. Parameters ary : array-like, shape = [n, m] A 2D Nnumpy array. cell_colors : tuple or list (default: ('white', 'black')) Tuple or list containing the two colors of the checkerboard pattern. font_colors : tuple or list (default: ('black', 'white')) Font colors corresponding to the cell colors. figsize : tuple (default: (2.5, 2.5)) Height and width of the figure fmt : str (default: '%.1f') Python string formatter for cell values. The default '%.1f' results in floats with 1 digit after the decimal point. Use '%d' to show numbers as integers. row_labels : list (default: None) List of the row labels. Uses the array row indices 0 to n by default. col_labels : list (default: None) List of the column labels. Uses the array column indices 0 to m by default. fontsize : int (default: None) Specifies the font size of the checkerboard table. Uses matplotlib's default if None. Returns fig : matplotlib Figure object. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/plotting/checkerboard_plot/ ecdf ecdf(x, y_label='ECDF', x_label=None, ax=None, percentile=None, ecdf_color=None, ecdf_marker='o', percentile_color='black', percentile_linestyle='--') Plots an Empirical Cumulative Distribution Function Parameters x : array or list, shape=[n_samples,] Array-like object containing the feature values y_label : str (default='ECDF') Text label for the y-axis x_label : str (default=None) Text label for the x-axis ax : matplotlib.axes.Axes (default: None) An existing matplotlib Axes. Creates one if ax=None percentile : float (default=None) Float between 0 and 1 for plotting a percentile threshold line ecdf_color : matplotlib color (default=None) Color for the ECDF plot; uses matplotlib defaults if None ecdf_marker : matplotlib marker (default='o') Marker style for the ECDF plot percentile_color : matplotlib color (default='black') Color for the percentile threshold if percentile is not None percentile_linestyle : matplotlib linestyle (default='--') Line style for the percentile threshold if percentile is not None Returns ax : matplotlib.axes.Axes object percentile_threshold : float Feature threshold at the percentile or None if percentile=None percentile_count : Number of if percentile is not None Number of samples that have a feature less or equal than the feature threshold at a percentile threshold or None if percentile=None Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/plotting/ecdf/ enrichment_plot enrichment_plot(df, colors='bgrkcy', markers=' ', linestyles='-', alpha=0.5, lw=2, where='post', grid=True, count_label='Count', xlim='auto', ylim='auto', invert_axes=False, legend_loc='best', ax=None) Plot stacked barplots Parameters df : pandas.DataFrame A pandas DataFrame where columns represent the different categories. colors: str (default: 'bgrcky') The colors of the bars. markers : str (default: ' ') Matplotlib markerstyles, e.g, 'sov' for square,circle, and triangle markers. linestyles : str (default: '-') Matplotlib linestyles, e.g., '-,--' to cycle normal and dashed lines. Note that the different linestyles need to be separated by commas. alpha : float (default: 0.5) Transparency level from 0.0 to 1.0. lw : int or float (default: 2) Linewidth parameter. where : {'post', 'pre', 'mid'} (default: 'post') Starting location of the steps. grid : bool (default: True ) Plots a grid if True. count_label : str (default: 'Count') Label for the \"Count\"-axis. xlim : 'auto' or array-like [min, max] (default: 'auto') Min and maximum position of the x-axis range. ylim : 'auto' or array-like [min, max] (default: 'auto') Min and maximum position of the y-axis range. invert_axes : bool (default: False) Plots count on the x-axis if True. legend_loc : str (default: 'best') Location of the plot legend {best, upper left, upper right, lower left, lower right} No legend if legend_loc=False ax : matplotlib axis, optional (default: None) Use this axis for plotting or make a new one otherwise Returns ax : matplotlib axis Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/plotting/enrichment_plot/ plot_confusion_matrix plot_confusion_matrix(conf_mat, hide_spines=False, hide_ticks=False, figsize=None, cmap=None, colorbar=False, show_absolute=True, show_normed=False) Plot a confusion matrix via matplotlib. Parameters conf_mat : array-like, shape = [n_classes, n_classes] Confusion matrix from evaluate.confusion matrix. hide_spines : bool (default: False) Hides axis spines if True. hide_ticks : bool (default: False) Hides axis ticks if True figsize : tuple (default: (2.5, 2.5)) Height and width of the figure cmap : matplotlib colormap (default: None ) Uses matplotlib.pyplot.cm.Blues if None colorbar : bool (default: False) Shows a colorbar if True show_absolute : bool (default: True) Shows absolute confusion matrix coefficients if True. At least one of show_absolute or show_normed must be True. show_normed : bool (default: False) Shows normed confusion matrix coefficients if True. The normed confusion matrix coefficients give the proportion of training examples per class that are assigned the correct label. At least one of show_absolute or show_normed must be True. Returns fig, ax : matplotlib.pyplot subplot objects Figure and axis elements of the subplot. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/plotting/plot_confusion_matrix/ plot_decision_regions plot_decision_regions(X, y, clf, feature_index=None, filler_feature_values=None, filler_feature_ranges=None, ax=None, X_highlight=None, res=None, legend=1, hide_spines=True, markers='s^oxv<>', colors='#1f77b4,#ff7f0e,#3ca02c,#d62728,#9467bd,#8c564b,#e377c2,#7f7f7f,#bcbd22,#17becf', scatter_kwargs=None, contourf_kwargs=None, scatter_highlight_kwargs=None) Plot decision regions of a classifier. Please note that this functions assumes that class labels are labeled consecutively, e.g,. 0, 1, 2, 3, 4, and 5. If you have class labels with integer labels > 4, you may want to provide additional colors and/or markers as colors and markers arguments. See http://matplotlib.org/examples/color/named_colors.html for more information. Parameters X : array-like, shape = [n_samples, n_features] Feature Matrix. y : array-like, shape = [n_samples] True class labels. clf : Classifier object. Must have a .predict method. feature_index : array-like (default: (0,) for 1D, (0, 1) otherwise) Feature indices to use for plotting. The first index in feature_index will be on the x-axis, the second index will be on the y-axis. filler_feature_values : dict (default: None) Only needed for number features > 2. Dictionary of feature index-value pairs for the features not being plotted. filler_feature_ranges : dict (default: None) Only needed for number features > 2. Dictionary of feature index-value pairs for the features not being plotted. Will use the ranges provided to select training samples for plotting. ax : matplotlib.axes.Axes (default: None) An existing matplotlib Axes. Creates one if ax=None. X_highlight : array-like, shape = [n_samples, n_features] (default: None) An array with data points that are used to highlight samples in X . res : float or array-like, shape = (2,) (default: None) This parameter was used to define the grid width, but it has been deprecated in favor of determining the number of points given the figure DPI and size automatically for optimal results and computational efficiency. To increase the resolution, it's is recommended to use to provide a dpi argument via matplotlib, e.g., plt.figure(dpi=600)`. hide_spines : bool (default: True) Hide axis spines if True. legend : int (default: 1) Integer to specify the legend location. No legend if legend is 0. markers : str (default: 's^oxv<>') Scatterplot markers. colors : str (default: 'red,blue,limegreen,gray,cyan') Comma separated list of colors. scatter_kwargs : dict (default: None) Keyword arguments for underlying matplotlib scatter function. contourf_kwargs : dict (default: None) Keyword arguments for underlying matplotlib contourf function. scatter_highlight_kwargs : dict (default: None) Keyword arguments for underlying matplotlib scatter function. Returns ax : matplotlib.axes.Axes object Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/plotting/plot_decision_regions/ plot_learning_curves plot_learning_curves(X_train, y_train, X_test, y_test, clf, train_marker='o', test_marker='^', scoring='misclassification error', suppress_plot=False, print_model=True, style='fivethirtyeight', legend_loc='best') Plots learning curves of a classifier. Parameters X_train : array-like, shape = [n_samples, n_features] Feature matrix of the training dataset. y_train : array-like, shape = [n_samples] True class labels of the training dataset. X_test : array-like, shape = [n_samples, n_features] Feature matrix of the test dataset. y_test : array-like, shape = [n_samples] True class labels of the test dataset. clf : Classifier object. Must have a .predict .fit method. train_marker : str (default: 'o') Marker for the training set line plot. test_marker : str (default: '^') Marker for the test set line plot. scoring : str (default: 'misclassification error') If not 'misclassification error', accepts the following metrics (from scikit-learn): {'accuracy', 'average_precision', 'f1_micro', 'f1_macro', 'f1_weighted', 'f1_samples', 'log_loss', 'precision', 'recall', 'roc_auc', 'adjusted_rand_score', 'mean_absolute_error', 'mean_squared_error', 'median_absolute_error', 'r2'} suppress_plot=False : bool (default: False) Suppress matplotlib plots if True. Recommended for testing purposes. print_model : bool (default: True) Print model parameters in plot title if True. style : str (default: 'fivethirtyeight') Matplotlib style legend_loc : str (default: 'best') Where to place the plot legend: {'best', 'upper left', 'upper right', 'lower left', 'lower right'} Returns errors : (training_error, test_error): tuple of lists Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/plotting/plot_learning_curves/ plot_linear_regression plot_linear_regression(X, y, model=LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False), corr_func='pearsonr', scattercolor='blue', fit_style='k--', legend=True, xlim='auto') Plot a linear regression line fit. Parameters X : numpy array, shape = [n_samples,] Samples. y : numpy array, shape (n_samples,) Target values model: object (default: sklearn.linear_model.LinearRegression) Estimator object for regression. Must implement a .fit() and .predict() method. corr_func: str or function (default: 'pearsonr') Uses pearsonr from scipy.stats if corr_func='pearsonr'. to compute the regression slope. If not 'pearsonr', the corr_func , the corr_func parameter expects a function of the form func( , ) as inputs, which is expected to return a tuple (, ) . scattercolor: string (default: blue) Color of scatter plot points. fit_style: string (default: k--) Style for the line fit. legend: bool (default: True) Plots legend with corr_coeff coef., fit coef., and intercept values. xlim: array-like (x_min, x_max) or 'auto' (default: 'auto') X-axis limits for the linear line fit. Returns regression_fit : tuple intercept, slope, corr_coeff (float, float, float) Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/plotting/plot_linear_regression/ plot_sequential_feature_selection plot_sequential_feature_selection(metric_dict, kind='std_dev', color='blue', bcolor='steelblue', marker='o', alpha=0.2, ylabel='Performance', confidence_interval=0.95) Plot feature selection results. Parameters metric_dict : mlxtend.SequentialFeatureSelector.get_metric_dict() object kind : str (default: \"std_dev\") The kind of error bar or confidence interval in {'std_dev', 'std_err', 'ci', None}. color : str (default: \"blue\") Color of the lineplot (accepts any matplotlib color name) bcolor : str (default: \"steelblue\"). Color of the error bars / confidence intervals (accepts any matplotlib color name). marker : str (default: \"o\") Marker of the line plot (accepts any matplotlib marker name). alpha : float in [0, 1] (default: 0.2) Transparency of the error bars / confidence intervals. ylabel : str (default: \"Performance\") Y-axis label. confidence_interval : float (default: 0.95) Confidence level if kind='ci' . Returns fig : matplotlib.pyplot.figure() object Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/plotting/plot_sequential_feature_selection/ remove_borders remove_borders(axes, left=False, bottom=False, right=True, top=True) Remove chart junk from matplotlib plots. Parameters axes : iterable An iterable containing plt.gca() or plt.subplot() objects, e.g. [plt.gca()]. left : bool (default: False ) Hide left axis spine if True. bottom : bool (default: False ) Hide bottom axis spine if True. right : bool (default: True ) Hide right axis spine if True. top : bool (default: True ) Hide top axis spine if True. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/plotting/remove_chartjunk/ scatterplotmatrix scatterplotmatrix(X, fig_axes=None, names=None, figsize=(8, 8), alpha=1.0, kwargs) Lower triangular of a scatterplot matrix Parameters X : array-like, shape={num_examples, num_features} Design matrix containing data instances (examples) with multiple exploratory variables (features). fix_axes : tuple (default: None) A (fig, axes) tuple, where fig is an figure object and axes is an axes object created via matplotlib, for example, by calling the pyplot subplot function fig, axes = plt.subplots(...) names : list (default: None) A list of string names, which should have the same number of elements as there are features (columns) in X . figsize : tuple (default: (8, 8)) Height and width of the subplot grid. Ignored if fig_axes is not None . alpha : float (default: 1.0) Transparency for both the scatter plots and the histograms along the diagonal. **kwargs : kwargs Keyword arguments for the scatterplots. Returns fix_axes : tuple A (fig, axes) tuple, where fig is an figure object and axes is an axes object created via matplotlib, for example, by calling the pyplot subplot function fig, axes = plt.subplots(...) stacked_barplot stacked_barplot(df, bar_width='auto', colors='bgrcky', labels='index', rotation=90, legend_loc='best') Function to plot stacked barplots Parameters df : pandas.DataFrame A pandas DataFrame where the index denotes the x-axis labels, and the columns contain the different measurements for each row. bar_width: 'auto' or float (default: 'auto') Parameter to set the widths of the bars. if 'auto', the width is automatically determined by the number of columns in the dataset. colors: str (default: 'bgrcky') The colors of the bars. labels: 'index' or iterable (default: 'index') If 'index', the DataFrame index will be used as x-tick labels. rotation: int (default: 90) Parameter to rotate the x-axis labels. legend_loc : str (default: 'best') Location of the plot legend {best, upper left, upper right, lower left, lower right} No legend if legend_loc=False Returns fig : matplotlib.pyplot figure object Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/plotting/stacked_barplot/","title":"Mlxtend.plotting"},{"location":"api_subpackages/mlxtend.plotting/#category_scatter","text":"category_scatter(x, y, label_col, data, markers='sxo^v', colors=('blue', 'green', 'red', 'purple', 'gray', 'cyan'), alpha=0.7, markersize=20.0, legend_loc='best') Scatter plot to plot categories in different colors/markerstyles. Parameters x : str or int DataFrame column name of the x-axis values or integer for the numpy ndarray column index. y : str DataFrame column name of the y-axis values or integer for the numpy ndarray column index data : Pandas DataFrame object or NumPy ndarray. markers : str Markers that are cycled through the label category. colors : tuple Colors that are cycled through the label category. alpha : float (default: 0.7) Parameter to control the transparency. markersize : float (default` : 20.0) Parameter to control the marker size. legend_loc : str (default: 'best') Location of the plot legend {best, upper left, upper right, lower left, lower right} No legend if legend_loc=False Returns fig : matplotlig.pyplot figure object Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/plotting/category_scatter/","title":"category_scatter"},{"location":"api_subpackages/mlxtend.plotting/#checkerboard_plot","text":"checkerboard_plot(ary, cell_colors=('white', 'black'), font_colors=('black', 'white'), fmt='%.1f', figsize=None, row_labels=None, col_labels=None, fontsize=None) Plot a checkerboard table / heatmap via matplotlib. Parameters ary : array-like, shape = [n, m] A 2D Nnumpy array. cell_colors : tuple or list (default: ('white', 'black')) Tuple or list containing the two colors of the checkerboard pattern. font_colors : tuple or list (default: ('black', 'white')) Font colors corresponding to the cell colors. figsize : tuple (default: (2.5, 2.5)) Height and width of the figure fmt : str (default: '%.1f') Python string formatter for cell values. The default '%.1f' results in floats with 1 digit after the decimal point. Use '%d' to show numbers as integers. row_labels : list (default: None) List of the row labels. Uses the array row indices 0 to n by default. col_labels : list (default: None) List of the column labels. Uses the array column indices 0 to m by default. fontsize : int (default: None) Specifies the font size of the checkerboard table. Uses matplotlib's default if None. Returns fig : matplotlib Figure object. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/plotting/checkerboard_plot/","title":"checkerboard_plot"},{"location":"api_subpackages/mlxtend.plotting/#ecdf","text":"ecdf(x, y_label='ECDF', x_label=None, ax=None, percentile=None, ecdf_color=None, ecdf_marker='o', percentile_color='black', percentile_linestyle='--') Plots an Empirical Cumulative Distribution Function Parameters x : array or list, shape=[n_samples,] Array-like object containing the feature values y_label : str (default='ECDF') Text label for the y-axis x_label : str (default=None) Text label for the x-axis ax : matplotlib.axes.Axes (default: None) An existing matplotlib Axes. Creates one if ax=None percentile : float (default=None) Float between 0 and 1 for plotting a percentile threshold line ecdf_color : matplotlib color (default=None) Color for the ECDF plot; uses matplotlib defaults if None ecdf_marker : matplotlib marker (default='o') Marker style for the ECDF plot percentile_color : matplotlib color (default='black') Color for the percentile threshold if percentile is not None percentile_linestyle : matplotlib linestyle (default='--') Line style for the percentile threshold if percentile is not None Returns ax : matplotlib.axes.Axes object percentile_threshold : float Feature threshold at the percentile or None if percentile=None percentile_count : Number of if percentile is not None Number of samples that have a feature less or equal than the feature threshold at a percentile threshold or None if percentile=None Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/plotting/ecdf/","title":"ecdf"},{"location":"api_subpackages/mlxtend.plotting/#enrichment_plot","text":"enrichment_plot(df, colors='bgrkcy', markers=' ', linestyles='-', alpha=0.5, lw=2, where='post', grid=True, count_label='Count', xlim='auto', ylim='auto', invert_axes=False, legend_loc='best', ax=None) Plot stacked barplots Parameters df : pandas.DataFrame A pandas DataFrame where columns represent the different categories. colors: str (default: 'bgrcky') The colors of the bars. markers : str (default: ' ') Matplotlib markerstyles, e.g, 'sov' for square,circle, and triangle markers. linestyles : str (default: '-') Matplotlib linestyles, e.g., '-,--' to cycle normal and dashed lines. Note that the different linestyles need to be separated by commas. alpha : float (default: 0.5) Transparency level from 0.0 to 1.0. lw : int or float (default: 2) Linewidth parameter. where : {'post', 'pre', 'mid'} (default: 'post') Starting location of the steps. grid : bool (default: True ) Plots a grid if True. count_label : str (default: 'Count') Label for the \"Count\"-axis. xlim : 'auto' or array-like [min, max] (default: 'auto') Min and maximum position of the x-axis range. ylim : 'auto' or array-like [min, max] (default: 'auto') Min and maximum position of the y-axis range. invert_axes : bool (default: False) Plots count on the x-axis if True. legend_loc : str (default: 'best') Location of the plot legend {best, upper left, upper right, lower left, lower right} No legend if legend_loc=False ax : matplotlib axis, optional (default: None) Use this axis for plotting or make a new one otherwise Returns ax : matplotlib axis Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/plotting/enrichment_plot/","title":"enrichment_plot"},{"location":"api_subpackages/mlxtend.plotting/#plot_confusion_matrix","text":"plot_confusion_matrix(conf_mat, hide_spines=False, hide_ticks=False, figsize=None, cmap=None, colorbar=False, show_absolute=True, show_normed=False) Plot a confusion matrix via matplotlib. Parameters conf_mat : array-like, shape = [n_classes, n_classes] Confusion matrix from evaluate.confusion matrix. hide_spines : bool (default: False) Hides axis spines if True. hide_ticks : bool (default: False) Hides axis ticks if True figsize : tuple (default: (2.5, 2.5)) Height and width of the figure cmap : matplotlib colormap (default: None ) Uses matplotlib.pyplot.cm.Blues if None colorbar : bool (default: False) Shows a colorbar if True show_absolute : bool (default: True) Shows absolute confusion matrix coefficients if True. At least one of show_absolute or show_normed must be True. show_normed : bool (default: False) Shows normed confusion matrix coefficients if True. The normed confusion matrix coefficients give the proportion of training examples per class that are assigned the correct label. At least one of show_absolute or show_normed must be True. Returns fig, ax : matplotlib.pyplot subplot objects Figure and axis elements of the subplot. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/plotting/plot_confusion_matrix/","title":"plot_confusion_matrix"},{"location":"api_subpackages/mlxtend.plotting/#plot_decision_regions","text":"plot_decision_regions(X, y, clf, feature_index=None, filler_feature_values=None, filler_feature_ranges=None, ax=None, X_highlight=None, res=None, legend=1, hide_spines=True, markers='s^oxv<>', colors='#1f77b4,#ff7f0e,#3ca02c,#d62728,#9467bd,#8c564b,#e377c2,#7f7f7f,#bcbd22,#17becf', scatter_kwargs=None, contourf_kwargs=None, scatter_highlight_kwargs=None) Plot decision regions of a classifier. Please note that this functions assumes that class labels are labeled consecutively, e.g,. 0, 1, 2, 3, 4, and 5. If you have class labels with integer labels > 4, you may want to provide additional colors and/or markers as colors and markers arguments. See http://matplotlib.org/examples/color/named_colors.html for more information. Parameters X : array-like, shape = [n_samples, n_features] Feature Matrix. y : array-like, shape = [n_samples] True class labels. clf : Classifier object. Must have a .predict method. feature_index : array-like (default: (0,) for 1D, (0, 1) otherwise) Feature indices to use for plotting. The first index in feature_index will be on the x-axis, the second index will be on the y-axis. filler_feature_values : dict (default: None) Only needed for number features > 2. Dictionary of feature index-value pairs for the features not being plotted. filler_feature_ranges : dict (default: None) Only needed for number features > 2. Dictionary of feature index-value pairs for the features not being plotted. Will use the ranges provided to select training samples for plotting. ax : matplotlib.axes.Axes (default: None) An existing matplotlib Axes. Creates one if ax=None. X_highlight : array-like, shape = [n_samples, n_features] (default: None) An array with data points that are used to highlight samples in X . res : float or array-like, shape = (2,) (default: None) This parameter was used to define the grid width, but it has been deprecated in favor of determining the number of points given the figure DPI and size automatically for optimal results and computational efficiency. To increase the resolution, it's is recommended to use to provide a dpi argument via matplotlib, e.g., plt.figure(dpi=600)`. hide_spines : bool (default: True) Hide axis spines if True. legend : int (default: 1) Integer to specify the legend location. No legend if legend is 0. markers : str (default: 's^oxv<>') Scatterplot markers. colors : str (default: 'red,blue,limegreen,gray,cyan') Comma separated list of colors. scatter_kwargs : dict (default: None) Keyword arguments for underlying matplotlib scatter function. contourf_kwargs : dict (default: None) Keyword arguments for underlying matplotlib contourf function. scatter_highlight_kwargs : dict (default: None) Keyword arguments for underlying matplotlib scatter function. Returns ax : matplotlib.axes.Axes object Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/plotting/plot_decision_regions/","title":"plot_decision_regions"},{"location":"api_subpackages/mlxtend.plotting/#plot_learning_curves","text":"plot_learning_curves(X_train, y_train, X_test, y_test, clf, train_marker='o', test_marker='^', scoring='misclassification error', suppress_plot=False, print_model=True, style='fivethirtyeight', legend_loc='best') Plots learning curves of a classifier. Parameters X_train : array-like, shape = [n_samples, n_features] Feature matrix of the training dataset. y_train : array-like, shape = [n_samples] True class labels of the training dataset. X_test : array-like, shape = [n_samples, n_features] Feature matrix of the test dataset. y_test : array-like, shape = [n_samples] True class labels of the test dataset. clf : Classifier object. Must have a .predict .fit method. train_marker : str (default: 'o') Marker for the training set line plot. test_marker : str (default: '^') Marker for the test set line plot. scoring : str (default: 'misclassification error') If not 'misclassification error', accepts the following metrics (from scikit-learn): {'accuracy', 'average_precision', 'f1_micro', 'f1_macro', 'f1_weighted', 'f1_samples', 'log_loss', 'precision', 'recall', 'roc_auc', 'adjusted_rand_score', 'mean_absolute_error', 'mean_squared_error', 'median_absolute_error', 'r2'} suppress_plot=False : bool (default: False) Suppress matplotlib plots if True. Recommended for testing purposes. print_model : bool (default: True) Print model parameters in plot title if True. style : str (default: 'fivethirtyeight') Matplotlib style legend_loc : str (default: 'best') Where to place the plot legend: {'best', 'upper left', 'upper right', 'lower left', 'lower right'} Returns errors : (training_error, test_error): tuple of lists Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/plotting/plot_learning_curves/","title":"plot_learning_curves"},{"location":"api_subpackages/mlxtend.plotting/#plot_linear_regression","text":"plot_linear_regression(X, y, model=LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False), corr_func='pearsonr', scattercolor='blue', fit_style='k--', legend=True, xlim='auto') Plot a linear regression line fit. Parameters X : numpy array, shape = [n_samples,] Samples. y : numpy array, shape (n_samples,) Target values model: object (default: sklearn.linear_model.LinearRegression) Estimator object for regression. Must implement a .fit() and .predict() method. corr_func: str or function (default: 'pearsonr') Uses pearsonr from scipy.stats if corr_func='pearsonr'. to compute the regression slope. If not 'pearsonr', the corr_func , the corr_func parameter expects a function of the form func( , ) as inputs, which is expected to return a tuple (, ) . scattercolor: string (default: blue) Color of scatter plot points. fit_style: string (default: k--) Style for the line fit. legend: bool (default: True) Plots legend with corr_coeff coef., fit coef., and intercept values. xlim: array-like (x_min, x_max) or 'auto' (default: 'auto') X-axis limits for the linear line fit. Returns regression_fit : tuple intercept, slope, corr_coeff (float, float, float) Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/plotting/plot_linear_regression/","title":"plot_linear_regression"},{"location":"api_subpackages/mlxtend.plotting/#plot_sequential_feature_selection","text":"plot_sequential_feature_selection(metric_dict, kind='std_dev', color='blue', bcolor='steelblue', marker='o', alpha=0.2, ylabel='Performance', confidence_interval=0.95) Plot feature selection results. Parameters metric_dict : mlxtend.SequentialFeatureSelector.get_metric_dict() object kind : str (default: \"std_dev\") The kind of error bar or confidence interval in {'std_dev', 'std_err', 'ci', None}. color : str (default: \"blue\") Color of the lineplot (accepts any matplotlib color name) bcolor : str (default: \"steelblue\"). Color of the error bars / confidence intervals (accepts any matplotlib color name). marker : str (default: \"o\") Marker of the line plot (accepts any matplotlib marker name). alpha : float in [0, 1] (default: 0.2) Transparency of the error bars / confidence intervals. ylabel : str (default: \"Performance\") Y-axis label. confidence_interval : float (default: 0.95) Confidence level if kind='ci' . Returns fig : matplotlib.pyplot.figure() object Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/plotting/plot_sequential_feature_selection/","title":"plot_sequential_feature_selection"},{"location":"api_subpackages/mlxtend.plotting/#remove_borders","text":"remove_borders(axes, left=False, bottom=False, right=True, top=True) Remove chart junk from matplotlib plots. Parameters axes : iterable An iterable containing plt.gca() or plt.subplot() objects, e.g. [plt.gca()]. left : bool (default: False ) Hide left axis spine if True. bottom : bool (default: False ) Hide bottom axis spine if True. right : bool (default: True ) Hide right axis spine if True. top : bool (default: True ) Hide top axis spine if True. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/plotting/remove_chartjunk/","title":"remove_borders"},{"location":"api_subpackages/mlxtend.plotting/#scatterplotmatrix","text":"scatterplotmatrix(X, fig_axes=None, names=None, figsize=(8, 8), alpha=1.0, kwargs) Lower triangular of a scatterplot matrix Parameters X : array-like, shape={num_examples, num_features} Design matrix containing data instances (examples) with multiple exploratory variables (features). fix_axes : tuple (default: None) A (fig, axes) tuple, where fig is an figure object and axes is an axes object created via matplotlib, for example, by calling the pyplot subplot function fig, axes = plt.subplots(...) names : list (default: None) A list of string names, which should have the same number of elements as there are features (columns) in X . figsize : tuple (default: (8, 8)) Height and width of the subplot grid. Ignored if fig_axes is not None . alpha : float (default: 1.0) Transparency for both the scatter plots and the histograms along the diagonal. **kwargs : kwargs Keyword arguments for the scatterplots. Returns fix_axes : tuple A (fig, axes) tuple, where fig is an figure object and axes is an axes object created via matplotlib, for example, by calling the pyplot subplot function fig, axes = plt.subplots(...)","title":"scatterplotmatrix"},{"location":"api_subpackages/mlxtend.plotting/#stacked_barplot","text":"stacked_barplot(df, bar_width='auto', colors='bgrcky', labels='index', rotation=90, legend_loc='best') Function to plot stacked barplots Parameters df : pandas.DataFrame A pandas DataFrame where the index denotes the x-axis labels, and the columns contain the different measurements for each row. bar_width: 'auto' or float (default: 'auto') Parameter to set the widths of the bars. if 'auto', the width is automatically determined by the number of columns in the dataset. colors: str (default: 'bgrcky') The colors of the bars. labels: 'index' or iterable (default: 'index') If 'index', the DataFrame index will be used as x-tick labels. rotation: int (default: 90) Parameter to rotate the x-axis labels. legend_loc : str (default: 'best') Location of the plot legend {best, upper left, upper right, lower left, lower right} No legend if legend_loc=False Returns fig : matplotlib.pyplot figure object Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/plotting/stacked_barplot/","title":"stacked_barplot"},{"location":"api_subpackages/mlxtend.preprocessing/","text":"mlxtend version: 0.14.0dev CopyTransformer CopyTransformer() Transformer that returns a copy of the input array For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/preprocessing/CopyTransformer/ Methods fit(X, y=None) Mock method. Does nothing. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] (default: None) Returns self fit_transform(X, y=None) Return a copy of the input array. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] (default: None) Returns X_copy : copy of the input X array. get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self transform(X, y=None) Return a copy of the input array. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] (default: None) Returns X_copy : copy of the input X array. DenseTransformer DenseTransformer(return_copy=True) Convert a sparse array into a dense array. For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/preprocessing/DenseTransformer/ Methods fit(X, y=None) Mock method. Does nothing. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] (default: None) Returns self fit_transform(X, y=None) Return a dense version of the input array. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] (default: None) Returns X_dense : dense version of the input X array. get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self transform(X, y=None) Return a dense version of the input array. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] (default: None) Returns X_dense : dense version of the input X array. MeanCenterer MeanCenterer() Column centering of vectors and matrices. Attributes col_means : numpy.ndarray [n_columns] NumPy array storing the mean values for centering after fitting the MeanCenterer object. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/preprocessing/MeanCenterer/ Methods fit(X) Gets the column means for mean centering. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Array of data vectors, where n_samples is the number of samples and n_features is the number of features. Returns self fit_transform(X) Fits and transforms an arry. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Array of data vectors, where n_samples is the number of samples and n_features is the number of features. Returns X_tr : {array-like, sparse matrix}, shape = [n_samples, n_features] A copy of the input array with the columns centered. transform(X) Centers a NumPy array. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Array of data vectors, where n_samples is the number of samples and n_features is the number of features. Returns X_tr : {array-like, sparse matrix}, shape = [n_samples, n_features] A copy of the input array with the columns centered. OnehotTransactions OnehotTransactions( args, * kwargs) Encoder class for transaction data in Python lists Parameters None Attributes columns_: list List of unique names in the X input list of lists Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/preprocessing/TransactionEncoder/ Methods fit(X) Learn unique column names from transaction DataFrame Parameters X : list of lists A python list of lists, where the outer list stores the n transactions and the inner list stores the items in each transaction. For example, [['Apple', 'Beer', 'Rice', 'Chicken'], ['Apple', 'Beer', 'Rice'], ['Apple', 'Beer'], ['Apple', 'Bananas'], ['Milk', 'Beer', 'Rice', 'Chicken'], ['Milk', 'Beer', 'Rice'], ['Milk', 'Beer'], ['Apple', 'Bananas']] fit_transform(X, sparse=False) Fit a TransactionEncoder encoder and transform a dataset. get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values. inverse_transform(array) Transforms an encoded NumPy array back into transactions. Parameters array : NumPy array [n_transactions, n_unique_items] The NumPy one-hot encoded boolean array of the input transactions, where the columns represent the unique items found in the input array in alphabetic order For example, array([[True , False, True , True , False, True ], [True , False, True , False, False, True ], [True , False, True , False, False, False], [True , True , False, False, False, False], [False, False, True , True , True , True ], [False, False, True , False, True , True ], [False, False, True , False, True , False], [True , True , False, False, False, False]]) The corresponding column labels are available as self.columns_, e.g., ['Apple', 'Bananas', 'Beer', 'Chicken', 'Milk', 'Rice'] Returns X : list of lists A python list of lists, where the outer list stores the n transactions and the inner list stores the items in each transaction. For example, [['Apple', 'Beer', 'Rice', 'Chicken'], ['Apple', 'Beer', 'Rice'], ['Apple', 'Beer'], ['Apple', 'Bananas'], ['Milk', 'Beer', 'Rice', 'Chicken'], ['Milk', 'Beer', 'Rice'], ['Milk', 'Beer'], ['Apple', 'Bananas']] set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self transform(X, sparse=False) Transform transactions into a one-hot encoded NumPy array. Parameters X : list of lists A python list of lists, where the outer list stores the n transactions and the inner list stores the items in each transaction. For example, [['Apple', 'Beer', 'Rice', 'Chicken'], ['Apple', 'Beer', 'Rice'], ['Apple', 'Beer'], ['Apple', 'Bananas'], ['Milk', 'Beer', 'Rice', 'Chicken'], ['Milk', 'Beer', 'Rice'], ['Milk', 'Beer'], ['Apple', 'Bananas']] sparse: bool (default=False) If True, transform will return Compressed Sparse Row matrix instead of the regular one. Returns array : NumPy array [n_transactions, n_unique_items] if sparse=False (default). Compressed Sparse Row matrix otherwise The one-hot encoded boolean array of the input transactions, where the columns represent the unique items found in the input array in alphabetic order. Exact representation depends on the sparse argument For example, array([[True , False, True , True , False, True ], [True , False, True , False, False, True ], [True , False, True , False, False, False], [True , True , False, False, False, False], [False, False, True , True , True , True ], [False, False, True , False, True , True ], [False, False, True , False, True , False], [True , True , False, False, False, False]]) The corresponding column labels are available as self.columns_, e.g., ['Apple', 'Bananas', 'Beer', 'Chicken', 'Milk', 'Rice'] TransactionEncoder TransactionEncoder() Encoder class for transaction data in Python lists Parameters None Attributes columns_: list List of unique names in the X input list of lists Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/preprocessing/TransactionEncoder/ Methods fit(X) Learn unique column names from transaction DataFrame Parameters X : list of lists A python list of lists, where the outer list stores the n transactions and the inner list stores the items in each transaction. For example, [['Apple', 'Beer', 'Rice', 'Chicken'], ['Apple', 'Beer', 'Rice'], ['Apple', 'Beer'], ['Apple', 'Bananas'], ['Milk', 'Beer', 'Rice', 'Chicken'], ['Milk', 'Beer', 'Rice'], ['Milk', 'Beer'], ['Apple', 'Bananas']] fit_transform(X, sparse=False) Fit a TransactionEncoder encoder and transform a dataset. get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values. inverse_transform(array) Transforms an encoded NumPy array back into transactions. Parameters array : NumPy array [n_transactions, n_unique_items] The NumPy one-hot encoded boolean array of the input transactions, where the columns represent the unique items found in the input array in alphabetic order For example, array([[True , False, True , True , False, True ], [True , False, True , False, False, True ], [True , False, True , False, False, False], [True , True , False, False, False, False], [False, False, True , True , True , True ], [False, False, True , False, True , True ], [False, False, True , False, True , False], [True , True , False, False, False, False]]) The corresponding column labels are available as self.columns_, e.g., ['Apple', 'Bananas', 'Beer', 'Chicken', 'Milk', 'Rice'] Returns X : list of lists A python list of lists, where the outer list stores the n transactions and the inner list stores the items in each transaction. For example, [['Apple', 'Beer', 'Rice', 'Chicken'], ['Apple', 'Beer', 'Rice'], ['Apple', 'Beer'], ['Apple', 'Bananas'], ['Milk', 'Beer', 'Rice', 'Chicken'], ['Milk', 'Beer', 'Rice'], ['Milk', 'Beer'], ['Apple', 'Bananas']] set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self transform(X, sparse=False) Transform transactions into a one-hot encoded NumPy array. Parameters X : list of lists A python list of lists, where the outer list stores the n transactions and the inner list stores the items in each transaction. For example, [['Apple', 'Beer', 'Rice', 'Chicken'], ['Apple', 'Beer', 'Rice'], ['Apple', 'Beer'], ['Apple', 'Bananas'], ['Milk', 'Beer', 'Rice', 'Chicken'], ['Milk', 'Beer', 'Rice'], ['Milk', 'Beer'], ['Apple', 'Bananas']] sparse: bool (default=False) If True, transform will return Compressed Sparse Row matrix instead of the regular one. Returns array : NumPy array [n_transactions, n_unique_items] if sparse=False (default). Compressed Sparse Row matrix otherwise The one-hot encoded boolean array of the input transactions, where the columns represent the unique items found in the input array in alphabetic order. Exact representation depends on the sparse argument For example, array([[True , False, True , True , False, True ], [True , False, True , False, False, True ], [True , False, True , False, False, False], [True , True , False, False, False, False], [False, False, True , True , True , True ], [False, False, True , False, True , True ], [False, False, True , False, True , False], [True , True , False, False, False, False]]) The corresponding column labels are available as self.columns_, e.g., ['Apple', 'Bananas', 'Beer', 'Chicken', 'Milk', 'Rice'] minmax_scaling minmax_scaling(array, columns, min_val=0, max_val=1) Min max scaling of pandas' DataFrames. Parameters array : pandas DataFrame or NumPy ndarray, shape = [n_rows, n_columns]. columns : array-like, shape = [n_columns] Array-like with column names, e.g., ['col1', 'col2', ...] or column indices [0, 2, 4, ...] min_val : int or float , optional (default= 0 ) minimum value after rescaling. max_val : int or float , optional (default= 1 ) maximum value after rescaling. Returns df_new : pandas DataFrame object. Copy of the array or DataFrame with rescaled columns. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/preprocessing/minmax_scaling/ one_hot one_hot(y, num_labels='auto', dtype='float') One-hot encoding of class labels Parameters y : array-like, shape = [n_classlabels] Python list or numpy array consisting of class labels. num_labels : int or 'auto' Number of unique labels in the class label array. Infers the number of unique labels from the input array if set to 'auto'. dtype : str NumPy array type (float, float32, float64) of the output array. Returns ary : numpy.ndarray, shape = [n_classlabels] One-hot encoded array, where each sample is represented as a row vector in the returned array. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/preprocessing/one_hot/ shuffle_arrays_unison shuffle_arrays_unison(arrays, random_seed=None) Shuffle NumPy arrays in unison. Parameters arrays : array-like, shape = [n_arrays] A list of NumPy arrays. random_seed : int (default: None) Sets the random state. Returns shuffled_arrays : A list of NumPy arrays after shuffling. Examples >>> import numpy as np >>> from mlxtend.preprocessing import shuffle_arrays_unison >>> X1 = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) >>> y1 = np.array([1, 2, 3]) >>> X2, y2 = shuffle_arrays_unison(arrays=[X1, y1], random_seed=3) >>> assert(X2.all() == np.array([[4, 5, 6], [1, 2, 3], [7, 8, 9]]).all()) >>> assert(y2.all() == np.array([2, 1, 3]).all()) >>> For more usage examples, please see http://rasbt.github.io/mlxtend/user_guide/preprocessing/shuffle_arrays_unison/ standardize standardize(array, columns=None, ddof=0, return_params=False, params=None) Standardize columns in pandas DataFrames. Parameters array : pandas DataFrame or NumPy ndarray, shape = [n_rows, n_columns]. columns : array-like, shape = [n_columns] (default: None) Array-like with column names, e.g., ['col1', 'col2', ...] or column indices [0, 2, 4, ...] If None, standardizes all columns. ddof : int (default: 0) Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements. return_params : dict (default: False) If set to True, a dictionary is returned in addition to the standardized array. The parameter dictionary contains the column means ('avgs') and standard deviations ('stds') of the individual columns. params : dict (default: None) A dictionary with column means and standard deviations as returned by the standardize function if return_params was set to True. If a params dictionary is provided, the standardize function will use these instead of computing them from the current array. Notes If all values in a given column are the same, these values are all set to 0.0 . The standard deviation in the parameters dictionary is consequently set to 1.0 to avoid dividing by zero. Returns df_new : pandas DataFrame object. Copy of the array or DataFrame with standardized columns. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/preprocessing/standardize/","title":"Mlxtend.preprocessing"},{"location":"api_subpackages/mlxtend.preprocessing/#copytransformer","text":"CopyTransformer() Transformer that returns a copy of the input array For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/preprocessing/CopyTransformer/","title":"CopyTransformer"},{"location":"api_subpackages/mlxtend.preprocessing/#methods","text":"fit(X, y=None) Mock method. Does nothing. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] (default: None) Returns self fit_transform(X, y=None) Return a copy of the input array. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] (default: None) Returns X_copy : copy of the input X array. get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self transform(X, y=None) Return a copy of the input array. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] (default: None) Returns X_copy : copy of the input X array.","title":"Methods"},{"location":"api_subpackages/mlxtend.preprocessing/#densetransformer","text":"DenseTransformer(return_copy=True) Convert a sparse array into a dense array. For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/preprocessing/DenseTransformer/","title":"DenseTransformer"},{"location":"api_subpackages/mlxtend.preprocessing/#methods_1","text":"fit(X, y=None) Mock method. Does nothing. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] (default: None) Returns self fit_transform(X, y=None) Return a dense version of the input array. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] (default: None) Returns X_dense : dense version of the input X array. get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self transform(X, y=None) Return a dense version of the input array. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] (default: None) Returns X_dense : dense version of the input X array.","title":"Methods"},{"location":"api_subpackages/mlxtend.preprocessing/#meancenterer","text":"MeanCenterer() Column centering of vectors and matrices. Attributes col_means : numpy.ndarray [n_columns] NumPy array storing the mean values for centering after fitting the MeanCenterer object. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/preprocessing/MeanCenterer/","title":"MeanCenterer"},{"location":"api_subpackages/mlxtend.preprocessing/#methods_2","text":"fit(X) Gets the column means for mean centering. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Array of data vectors, where n_samples is the number of samples and n_features is the number of features. Returns self fit_transform(X) Fits and transforms an arry. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Array of data vectors, where n_samples is the number of samples and n_features is the number of features. Returns X_tr : {array-like, sparse matrix}, shape = [n_samples, n_features] A copy of the input array with the columns centered. transform(X) Centers a NumPy array. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Array of data vectors, where n_samples is the number of samples and n_features is the number of features. Returns X_tr : {array-like, sparse matrix}, shape = [n_samples, n_features] A copy of the input array with the columns centered.","title":"Methods"},{"location":"api_subpackages/mlxtend.preprocessing/#onehottransactions","text":"OnehotTransactions( args, * kwargs) Encoder class for transaction data in Python lists Parameters None Attributes columns_: list List of unique names in the X input list of lists Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/preprocessing/TransactionEncoder/","title":"OnehotTransactions"},{"location":"api_subpackages/mlxtend.preprocessing/#methods_3","text":"fit(X) Learn unique column names from transaction DataFrame Parameters X : list of lists A python list of lists, where the outer list stores the n transactions and the inner list stores the items in each transaction. For example, [['Apple', 'Beer', 'Rice', 'Chicken'], ['Apple', 'Beer', 'Rice'], ['Apple', 'Beer'], ['Apple', 'Bananas'], ['Milk', 'Beer', 'Rice', 'Chicken'], ['Milk', 'Beer', 'Rice'], ['Milk', 'Beer'], ['Apple', 'Bananas']] fit_transform(X, sparse=False) Fit a TransactionEncoder encoder and transform a dataset. get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values. inverse_transform(array) Transforms an encoded NumPy array back into transactions. Parameters array : NumPy array [n_transactions, n_unique_items] The NumPy one-hot encoded boolean array of the input transactions, where the columns represent the unique items found in the input array in alphabetic order For example, array([[True , False, True , True , False, True ], [True , False, True , False, False, True ], [True , False, True , False, False, False], [True , True , False, False, False, False], [False, False, True , True , True , True ], [False, False, True , False, True , True ], [False, False, True , False, True , False], [True , True , False, False, False, False]]) The corresponding column labels are available as self.columns_, e.g., ['Apple', 'Bananas', 'Beer', 'Chicken', 'Milk', 'Rice'] Returns X : list of lists A python list of lists, where the outer list stores the n transactions and the inner list stores the items in each transaction. For example, [['Apple', 'Beer', 'Rice', 'Chicken'], ['Apple', 'Beer', 'Rice'], ['Apple', 'Beer'], ['Apple', 'Bananas'], ['Milk', 'Beer', 'Rice', 'Chicken'], ['Milk', 'Beer', 'Rice'], ['Milk', 'Beer'], ['Apple', 'Bananas']] set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self transform(X, sparse=False) Transform transactions into a one-hot encoded NumPy array. Parameters X : list of lists A python list of lists, where the outer list stores the n transactions and the inner list stores the items in each transaction. For example, [['Apple', 'Beer', 'Rice', 'Chicken'], ['Apple', 'Beer', 'Rice'], ['Apple', 'Beer'], ['Apple', 'Bananas'], ['Milk', 'Beer', 'Rice', 'Chicken'], ['Milk', 'Beer', 'Rice'], ['Milk', 'Beer'], ['Apple', 'Bananas']] sparse: bool (default=False) If True, transform will return Compressed Sparse Row matrix instead of the regular one. Returns array : NumPy array [n_transactions, n_unique_items] if sparse=False (default). Compressed Sparse Row matrix otherwise The one-hot encoded boolean array of the input transactions, where the columns represent the unique items found in the input array in alphabetic order. Exact representation depends on the sparse argument For example, array([[True , False, True , True , False, True ], [True , False, True , False, False, True ], [True , False, True , False, False, False], [True , True , False, False, False, False], [False, False, True , True , True , True ], [False, False, True , False, True , True ], [False, False, True , False, True , False], [True , True , False, False, False, False]]) The corresponding column labels are available as self.columns_, e.g., ['Apple', 'Bananas', 'Beer', 'Chicken', 'Milk', 'Rice']","title":"Methods"},{"location":"api_subpackages/mlxtend.preprocessing/#transactionencoder","text":"TransactionEncoder() Encoder class for transaction data in Python lists Parameters None Attributes columns_: list List of unique names in the X input list of lists Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/preprocessing/TransactionEncoder/","title":"TransactionEncoder"},{"location":"api_subpackages/mlxtend.preprocessing/#methods_4","text":"fit(X) Learn unique column names from transaction DataFrame Parameters X : list of lists A python list of lists, where the outer list stores the n transactions and the inner list stores the items in each transaction. For example, [['Apple', 'Beer', 'Rice', 'Chicken'], ['Apple', 'Beer', 'Rice'], ['Apple', 'Beer'], ['Apple', 'Bananas'], ['Milk', 'Beer', 'Rice', 'Chicken'], ['Milk', 'Beer', 'Rice'], ['Milk', 'Beer'], ['Apple', 'Bananas']] fit_transform(X, sparse=False) Fit a TransactionEncoder encoder and transform a dataset. get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values. inverse_transform(array) Transforms an encoded NumPy array back into transactions. Parameters array : NumPy array [n_transactions, n_unique_items] The NumPy one-hot encoded boolean array of the input transactions, where the columns represent the unique items found in the input array in alphabetic order For example, array([[True , False, True , True , False, True ], [True , False, True , False, False, True ], [True , False, True , False, False, False], [True , True , False, False, False, False], [False, False, True , True , True , True ], [False, False, True , False, True , True ], [False, False, True , False, True , False], [True , True , False, False, False, False]]) The corresponding column labels are available as self.columns_, e.g., ['Apple', 'Bananas', 'Beer', 'Chicken', 'Milk', 'Rice'] Returns X : list of lists A python list of lists, where the outer list stores the n transactions and the inner list stores the items in each transaction. For example, [['Apple', 'Beer', 'Rice', 'Chicken'], ['Apple', 'Beer', 'Rice'], ['Apple', 'Beer'], ['Apple', 'Bananas'], ['Milk', 'Beer', 'Rice', 'Chicken'], ['Milk', 'Beer', 'Rice'], ['Milk', 'Beer'], ['Apple', 'Bananas']] set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self transform(X, sparse=False) Transform transactions into a one-hot encoded NumPy array. Parameters X : list of lists A python list of lists, where the outer list stores the n transactions and the inner list stores the items in each transaction. For example, [['Apple', 'Beer', 'Rice', 'Chicken'], ['Apple', 'Beer', 'Rice'], ['Apple', 'Beer'], ['Apple', 'Bananas'], ['Milk', 'Beer', 'Rice', 'Chicken'], ['Milk', 'Beer', 'Rice'], ['Milk', 'Beer'], ['Apple', 'Bananas']] sparse: bool (default=False) If True, transform will return Compressed Sparse Row matrix instead of the regular one. Returns array : NumPy array [n_transactions, n_unique_items] if sparse=False (default). Compressed Sparse Row matrix otherwise The one-hot encoded boolean array of the input transactions, where the columns represent the unique items found in the input array in alphabetic order. Exact representation depends on the sparse argument For example, array([[True , False, True , True , False, True ], [True , False, True , False, False, True ], [True , False, True , False, False, False], [True , True , False, False, False, False], [False, False, True , True , True , True ], [False, False, True , False, True , True ], [False, False, True , False, True , False], [True , True , False, False, False, False]]) The corresponding column labels are available as self.columns_, e.g., ['Apple', 'Bananas', 'Beer', 'Chicken', 'Milk', 'Rice']","title":"Methods"},{"location":"api_subpackages/mlxtend.preprocessing/#minmax_scaling","text":"minmax_scaling(array, columns, min_val=0, max_val=1) Min max scaling of pandas' DataFrames. Parameters array : pandas DataFrame or NumPy ndarray, shape = [n_rows, n_columns]. columns : array-like, shape = [n_columns] Array-like with column names, e.g., ['col1', 'col2', ...] or column indices [0, 2, 4, ...] min_val : int or float , optional (default= 0 ) minimum value after rescaling. max_val : int or float , optional (default= 1 ) maximum value after rescaling. Returns df_new : pandas DataFrame object. Copy of the array or DataFrame with rescaled columns. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/preprocessing/minmax_scaling/","title":"minmax_scaling"},{"location":"api_subpackages/mlxtend.preprocessing/#one_hot","text":"one_hot(y, num_labels='auto', dtype='float') One-hot encoding of class labels Parameters y : array-like, shape = [n_classlabels] Python list or numpy array consisting of class labels. num_labels : int or 'auto' Number of unique labels in the class label array. Infers the number of unique labels from the input array if set to 'auto'. dtype : str NumPy array type (float, float32, float64) of the output array. Returns ary : numpy.ndarray, shape = [n_classlabels] One-hot encoded array, where each sample is represented as a row vector in the returned array. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/preprocessing/one_hot/","title":"one_hot"},{"location":"api_subpackages/mlxtend.preprocessing/#shuffle_arrays_unison","text":"shuffle_arrays_unison(arrays, random_seed=None) Shuffle NumPy arrays in unison. Parameters arrays : array-like, shape = [n_arrays] A list of NumPy arrays. random_seed : int (default: None) Sets the random state. Returns shuffled_arrays : A list of NumPy arrays after shuffling. Examples >>> import numpy as np >>> from mlxtend.preprocessing import shuffle_arrays_unison >>> X1 = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) >>> y1 = np.array([1, 2, 3]) >>> X2, y2 = shuffle_arrays_unison(arrays=[X1, y1], random_seed=3) >>> assert(X2.all() == np.array([[4, 5, 6], [1, 2, 3], [7, 8, 9]]).all()) >>> assert(y2.all() == np.array([2, 1, 3]).all()) >>> For more usage examples, please see http://rasbt.github.io/mlxtend/user_guide/preprocessing/shuffle_arrays_unison/","title":"shuffle_arrays_unison"},{"location":"api_subpackages/mlxtend.preprocessing/#standardize","text":"standardize(array, columns=None, ddof=0, return_params=False, params=None) Standardize columns in pandas DataFrames. Parameters array : pandas DataFrame or NumPy ndarray, shape = [n_rows, n_columns]. columns : array-like, shape = [n_columns] (default: None) Array-like with column names, e.g., ['col1', 'col2', ...] or column indices [0, 2, 4, ...] If None, standardizes all columns. ddof : int (default: 0) Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements. return_params : dict (default: False) If set to True, a dictionary is returned in addition to the standardized array. The parameter dictionary contains the column means ('avgs') and standard deviations ('stds') of the individual columns. params : dict (default: None) A dictionary with column means and standard deviations as returned by the standardize function if return_params was set to True. If a params dictionary is provided, the standardize function will use these instead of computing them from the current array. Notes If all values in a given column are the same, these values are all set to 0.0 . The standard deviation in the parameters dictionary is consequently set to 1.0 to avoid dividing by zero. Returns df_new : pandas DataFrame object. Copy of the array or DataFrame with standardized columns. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/preprocessing/standardize/","title":"standardize"},{"location":"api_subpackages/mlxtend.regressor/","text":"mlxtend version: 0.14.0dev LinearRegression LinearRegression(eta=0.01, epochs=50, minibatches=None, random_seed=None, print_progress=0) Ordinary least squares linear regression. Parameters eta : float (default: 0.01) solver rate (between 0.0 and 1.0) epochs : int (default: 50) Passes over the training dataset. Prior to each epoch, the dataset is shuffled if minibatches > 1 to prevent cycles in stochastic gradient descent. minibatches : int (default: None) The number of minibatches for gradient-based optimization. If None: Normal Equations (closed-form solution) If 1: Gradient Descent learning If len(y): Stochastic Gradient Descent learning If 1 < minibatches < len(y): Minibatch learning random_seed : int (default: None) Set random state for shuffling and initializing the weights. print_progress : int (default: 0) Prints progress in fitting to stderr if not solver='normal equation' 0: No output 1: Epochs elapsed and cost 2: 1 plus time elapsed 3: 2 plus estimated time until completion Attributes w_ : 2d-array, shape={n_features, 1} Model weights after fitting. b_ : 1d-array, shape={1,} Bias unit after fitting. cost_ : list Sum of squared errors after each epoch; ignored if solver='normal equation' Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/regressor/LinearRegression/ Methods fit(X, y, init_params=True) Learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. init_params : bool (default: True) Re-initializes model parameters prior to fitting. Set False to continue training with weights from a previous model fitting. Returns self : object get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values.' adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py Author: Gael Varoquaux gael.varoquaux@normalesup.org License: BSD 3 clause predict(X) Predict targets from X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns target_values : array-like, shape = [n_samples] Predicted target values. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py Author: Gael Varoquaux gael.varoquaux@normalesup.org License: BSD 3 clause StackingCVRegressor StackingCVRegressor(regressors, meta_regressor, cv=5, shuffle=True, use_features_in_secondary=False, store_train_meta_features=False, refit=True) A 'Stacking Cross-Validation' regressor for scikit-learn estimators. New in mlxtend v0.7.0 Notes The StackingCVRegressor uses scikit-learn's check_cv internally, which doesn't support a random seed. Thus NumPy's random seed need to be specified explicitely for deterministic behavior, for instance, by setting np.random.seed(RANDOM_SEED) prior to fitting the StackingCVRegressor Parameters regressors : array-like, shape = [n_regressors] A list of regressors. Invoking the fit method on the StackingCVRegressor will fit clones of these original regressors that will be stored in the class attribute self.regr_ . meta_regressor : object The meta-regressor to be fitted on the ensemble of regressor cv : int, cross-validation generator or iterable, optional (default: 5) Determines the cross-validation splitting strategy. Possible inputs for cv are: - None, to use the default 5-fold cross validation, - integer, to specify the number of folds in a KFold , - An object to be used as a cross-validation generator. - An iterable yielding train, test splits. For integer/None inputs, it will use KFold cross-validation use_features_in_secondary : bool (default: False) If True, the meta-regressor will be trained both on the predictions of the original regressors and the original dataset. If False, the meta-regressor will be trained only on the predictions of the original regressors. shuffle : bool (default: True) If True, and the cv argument is integer, the training data will be shuffled at fitting stage prior to cross-validation. If the cv argument is a specific cross validation technique, this argument is omitted. store_train_meta_features : bool (default: False) If True, the meta-features computed from the training data used for fitting the meta-regressor stored in the self.train_meta_features_ array, which can be accessed after calling fit . refit : bool (default: True) Clones the regressors for stacking regression if True (default) or else uses the original ones, which will be refitted on the dataset upon calling the fit method. Setting refit=False is recommended if you are working with estimators that are supporting the scikit-learn fit/predict API interface but are not compatible to scikit-learn's clone function. Attributes train_meta_features : numpy array, shape = [n_samples, n_regressors] meta-features for training data, where n_samples is the number of samples in training data and len(self.regressors) is the number of regressors. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/regressor/StackingCVRegressor/ Methods fit(X, y, groups=None, sample_weight=None) Fit ensemble regressors and the meta-regressor. Parameters X : numpy array, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : numpy array, shape = [n_samples] Target values. groups : numpy array/None, shape = [n_samples] The group that each sample belongs to. This is used by specific folding strategies such as GroupKFold() sample_weight : array-like, shape = [n_samples], optional Sample weights passed as sample_weights to each regressor in the regressors list as well as the meta_regressor. Raises error if some regressor does not support sample_weight in the fit() method. Returns self : object fit_transform(X, y=None, fit_params) Fit to data, then transform it. Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X. Parameters X : numpy array of shape [n_samples, n_features] Training set. y : numpy array of shape [n_samples] Target values. Returns X_new : numpy array of shape [n_samples, n_features_new] Transformed array. get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values. predict(X) Predict target values for X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns y_target : array-like, shape = [n_samples] or [n_samples, n_targets] Predicted target values. predict_meta_features(X) Get meta-features of test-data. Parameters X : numpy array, shape = [n_samples, n_features] Test vectors, where n_samples is the number of samples and n_features is the number of features. Returns meta-features : numpy array, shape = [n_samples, len(self.regressors)] meta-features for test data, where n_samples is the number of samples in test data and len(self.regressors) is the number of regressors. score(X, y, sample_weight=None) Returns the coefficient of determination R^2 of the prediction. The coefficient R^2 is defined as (1 - u/v), where u is the residual sum of squares ((y_true - y_pred) 2).sum() and v is the total sum of squares ((y_true - y_true.mean()) 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0. Parameters X : array-like, shape = (n_samples, n_features) Test samples. For some estimators this may be a precomputed kernel matrix instead, shape = (n_samples, n_samples_fitted], where n_samples_fitted is the number of samples used in the fitting for the estimator. y : array-like, shape = (n_samples) or (n_samples, n_outputs) True values for X. sample_weight : array-like, shape = [n_samples], optional Sample weights. Returns score : float R^2 of self.predict(X) wrt. y. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self StackingRegressor StackingRegressor(regressors, meta_regressor, verbose=0, use_features_in_secondary=False, store_train_meta_features=False, refit=True) A Stacking regressor for scikit-learn estimators for regression. Parameters regressors : array-like, shape = [n_regressors] A list of regressors. Invoking the fit method on the StackingRegressor will fit clones of those original regressors that will be stored in the class attribute self.regr_ . meta_regressor : object The meta-regressor to be fitted on the ensemble of regressors verbose : int, optional (default=0) Controls the verbosity of the building process. - verbose=0 (default): Prints nothing - verbose=1 : Prints the number & name of the regressor being fitted - verbose=2 : Prints info about the parameters of the regressor being fitted - verbose>2 : Changes verbose param of the underlying regressor to self.verbose - 2 use_features_in_secondary : bool (default: False) If True, the meta-regressor will be trained both on the predictions of the original regressors and the original dataset. If False, the meta-regressor will be trained only on the predictions of the original regressors. store_train_meta_features : bool (default: False) If True, the meta-features computed from the training data used for fitting the meta-regressor stored in the self.train_meta_features_ array, which can be accessed after calling fit . Attributes regr_ : list, shape=[n_regressors] Fitted regressors (clones of the original regressors) meta_regr_ : estimator Fitted meta-regressor (clone of the original meta-estimator) coef_ : array-like, shape = [n_features] Model coefficients of the fitted meta-estimator intercept_ : float Intercept of the fitted meta-estimator train_meta_features : numpy array, shape = [n_samples, len(self.regressors)] meta-features for training data, where n_samples is the number of samples in training data and len(self.regressors) is the number of regressors. refit : bool (default: True) Clones the regressors for stacking regression if True (default) or else uses the original ones, which will be refitted on the dataset upon calling the fit method. Setting refit=False is recommended if you are working with estimators that are supporting the scikit-learn fit/predict API interface but are not compatible to scikit-learn's clone function. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/regressor/StackingRegressor/ Methods fit(X, y, sample_weight=None) Learn weight coefficients from training data for each regressor. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] or [n_samples, n_targets] Target values. sample_weight : array-like, shape = [n_samples], optional Sample weights passed as sample_weights to each regressor in the regressors list as well as the meta_regressor. Raises error if some regressor does not support sample_weight in the fit() method. Returns self : object fit_transform(X, y=None, fit_params) Fit to data, then transform it. Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X. Parameters X : numpy array of shape [n_samples, n_features] Training set. y : numpy array of shape [n_samples] Target values. Returns X_new : numpy array of shape [n_samples, n_features_new] Transformed array. get_params(deep=True) Return estimator parameter names for GridSearch support. predict(X) Predict target values for X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns y_target : array-like, shape = [n_samples] or [n_samples, n_targets] Predicted target values. predict_meta_features(X) Get meta-features of test-data. Parameters X : numpy array, shape = [n_samples, n_features] Test vectors, where n_samples is the number of samples and n_features is the number of features. Returns meta-features : numpy array, shape = [n_samples, len(self.regressors)] meta-features for test data, where n_samples is the number of samples in test data and len(self.regressors) is the number of regressors. score(X, y, sample_weight=None) Returns the coefficient of determination R^2 of the prediction. The coefficient R^2 is defined as (1 - u/v), where u is the residual sum of squares ((y_true - y_pred) 2).sum() and v is the total sum of squares ((y_true - y_true.mean()) 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0. Parameters X : array-like, shape = (n_samples, n_features) Test samples. For some estimators this may be a precomputed kernel matrix instead, shape = (n_samples, n_samples_fitted], where n_samples_fitted is the number of samples used in the fitting for the estimator. y : array-like, shape = (n_samples) or (n_samples, n_outputs) True values for X. sample_weight : array-like, shape = [n_samples], optional Sample weights. Returns score : float R^2 of self.predict(X) wrt. y. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self Properties coef_ None intercept_ None","title":"Mlxtend.regressor"},{"location":"api_subpackages/mlxtend.regressor/#linearregression","text":"LinearRegression(eta=0.01, epochs=50, minibatches=None, random_seed=None, print_progress=0) Ordinary least squares linear regression. Parameters eta : float (default: 0.01) solver rate (between 0.0 and 1.0) epochs : int (default: 50) Passes over the training dataset. Prior to each epoch, the dataset is shuffled if minibatches > 1 to prevent cycles in stochastic gradient descent. minibatches : int (default: None) The number of minibatches for gradient-based optimization. If None: Normal Equations (closed-form solution) If 1: Gradient Descent learning If len(y): Stochastic Gradient Descent learning If 1 < minibatches < len(y): Minibatch learning random_seed : int (default: None) Set random state for shuffling and initializing the weights. print_progress : int (default: 0) Prints progress in fitting to stderr if not solver='normal equation' 0: No output 1: Epochs elapsed and cost 2: 1 plus time elapsed 3: 2 plus estimated time until completion Attributes w_ : 2d-array, shape={n_features, 1} Model weights after fitting. b_ : 1d-array, shape={1,} Bias unit after fitting. cost_ : list Sum of squared errors after each epoch; ignored if solver='normal equation' Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/regressor/LinearRegression/","title":"LinearRegression"},{"location":"api_subpackages/mlxtend.regressor/#methods","text":"fit(X, y, init_params=True) Learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. init_params : bool (default: True) Re-initializes model parameters prior to fitting. Set False to continue training with weights from a previous model fitting. Returns self : object get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values.' adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py","title":"Methods"},{"location":"api_subpackages/mlxtend.regressor/#author-gael-varoquaux-amp103amp97amp101amp108amp46amp118amp97amp114amp111amp113amp117amp97amp117amp120amp64amp110amp111amp114amp109amp97amp108amp101amp115amp117amp112amp46amp111amp114amp103","text":"","title":"Author: Gael Varoquaux gael.varoquaux@normalesup.org"},{"location":"api_subpackages/mlxtend.regressor/#license-bsd-3-clause","text":"predict(X) Predict targets from X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns target_values : array-like, shape = [n_samples] Predicted target values. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py","title":"License: BSD 3 clause"},{"location":"api_subpackages/mlxtend.regressor/#author-gael-varoquaux-amp103amp97amp101amp108amp46amp118amp97amp114amp111amp113amp117amp97amp117amp120amp64amp110amp111amp114amp109amp97amp108amp101amp115amp117amp112amp46amp111amp114amp103_1","text":"","title":"Author: Gael Varoquaux gael.varoquaux@normalesup.org"},{"location":"api_subpackages/mlxtend.regressor/#license-bsd-3-clause_1","text":"","title":"License: BSD 3 clause"},{"location":"api_subpackages/mlxtend.regressor/#stackingcvregressor","text":"StackingCVRegressor(regressors, meta_regressor, cv=5, shuffle=True, use_features_in_secondary=False, store_train_meta_features=False, refit=True) A 'Stacking Cross-Validation' regressor for scikit-learn estimators. New in mlxtend v0.7.0 Notes The StackingCVRegressor uses scikit-learn's check_cv internally, which doesn't support a random seed. Thus NumPy's random seed need to be specified explicitely for deterministic behavior, for instance, by setting np.random.seed(RANDOM_SEED) prior to fitting the StackingCVRegressor Parameters regressors : array-like, shape = [n_regressors] A list of regressors. Invoking the fit method on the StackingCVRegressor will fit clones of these original regressors that will be stored in the class attribute self.regr_ . meta_regressor : object The meta-regressor to be fitted on the ensemble of regressor cv : int, cross-validation generator or iterable, optional (default: 5) Determines the cross-validation splitting strategy. Possible inputs for cv are: - None, to use the default 5-fold cross validation, - integer, to specify the number of folds in a KFold , - An object to be used as a cross-validation generator. - An iterable yielding train, test splits. For integer/None inputs, it will use KFold cross-validation use_features_in_secondary : bool (default: False) If True, the meta-regressor will be trained both on the predictions of the original regressors and the original dataset. If False, the meta-regressor will be trained only on the predictions of the original regressors. shuffle : bool (default: True) If True, and the cv argument is integer, the training data will be shuffled at fitting stage prior to cross-validation. If the cv argument is a specific cross validation technique, this argument is omitted. store_train_meta_features : bool (default: False) If True, the meta-features computed from the training data used for fitting the meta-regressor stored in the self.train_meta_features_ array, which can be accessed after calling fit . refit : bool (default: True) Clones the regressors for stacking regression if True (default) or else uses the original ones, which will be refitted on the dataset upon calling the fit method. Setting refit=False is recommended if you are working with estimators that are supporting the scikit-learn fit/predict API interface but are not compatible to scikit-learn's clone function. Attributes train_meta_features : numpy array, shape = [n_samples, n_regressors] meta-features for training data, where n_samples is the number of samples in training data and len(self.regressors) is the number of regressors. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/regressor/StackingCVRegressor/","title":"StackingCVRegressor"},{"location":"api_subpackages/mlxtend.regressor/#methods_1","text":"fit(X, y, groups=None, sample_weight=None) Fit ensemble regressors and the meta-regressor. Parameters X : numpy array, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : numpy array, shape = [n_samples] Target values. groups : numpy array/None, shape = [n_samples] The group that each sample belongs to. This is used by specific folding strategies such as GroupKFold() sample_weight : array-like, shape = [n_samples], optional Sample weights passed as sample_weights to each regressor in the regressors list as well as the meta_regressor. Raises error if some regressor does not support sample_weight in the fit() method. Returns self : object fit_transform(X, y=None, fit_params) Fit to data, then transform it. Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X. Parameters X : numpy array of shape [n_samples, n_features] Training set. y : numpy array of shape [n_samples] Target values. Returns X_new : numpy array of shape [n_samples, n_features_new] Transformed array. get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values. predict(X) Predict target values for X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns y_target : array-like, shape = [n_samples] or [n_samples, n_targets] Predicted target values. predict_meta_features(X) Get meta-features of test-data. Parameters X : numpy array, shape = [n_samples, n_features] Test vectors, where n_samples is the number of samples and n_features is the number of features. Returns meta-features : numpy array, shape = [n_samples, len(self.regressors)] meta-features for test data, where n_samples is the number of samples in test data and len(self.regressors) is the number of regressors. score(X, y, sample_weight=None) Returns the coefficient of determination R^2 of the prediction. The coefficient R^2 is defined as (1 - u/v), where u is the residual sum of squares ((y_true - y_pred) 2).sum() and v is the total sum of squares ((y_true - y_true.mean()) 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0. Parameters X : array-like, shape = (n_samples, n_features) Test samples. For some estimators this may be a precomputed kernel matrix instead, shape = (n_samples, n_samples_fitted], where n_samples_fitted is the number of samples used in the fitting for the estimator. y : array-like, shape = (n_samples) or (n_samples, n_outputs) True values for X. sample_weight : array-like, shape = [n_samples], optional Sample weights. Returns score : float R^2 of self.predict(X) wrt. y. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self","title":"Methods"},{"location":"api_subpackages/mlxtend.regressor/#stackingregressor","text":"StackingRegressor(regressors, meta_regressor, verbose=0, use_features_in_secondary=False, store_train_meta_features=False, refit=True) A Stacking regressor for scikit-learn estimators for regression. Parameters regressors : array-like, shape = [n_regressors] A list of regressors. Invoking the fit method on the StackingRegressor will fit clones of those original regressors that will be stored in the class attribute self.regr_ . meta_regressor : object The meta-regressor to be fitted on the ensemble of regressors verbose : int, optional (default=0) Controls the verbosity of the building process. - verbose=0 (default): Prints nothing - verbose=1 : Prints the number & name of the regressor being fitted - verbose=2 : Prints info about the parameters of the regressor being fitted - verbose>2 : Changes verbose param of the underlying regressor to self.verbose - 2 use_features_in_secondary : bool (default: False) If True, the meta-regressor will be trained both on the predictions of the original regressors and the original dataset. If False, the meta-regressor will be trained only on the predictions of the original regressors. store_train_meta_features : bool (default: False) If True, the meta-features computed from the training data used for fitting the meta-regressor stored in the self.train_meta_features_ array, which can be accessed after calling fit . Attributes regr_ : list, shape=[n_regressors] Fitted regressors (clones of the original regressors) meta_regr_ : estimator Fitted meta-regressor (clone of the original meta-estimator) coef_ : array-like, shape = [n_features] Model coefficients of the fitted meta-estimator intercept_ : float Intercept of the fitted meta-estimator train_meta_features : numpy array, shape = [n_samples, len(self.regressors)] meta-features for training data, where n_samples is the number of samples in training data and len(self.regressors) is the number of regressors. refit : bool (default: True) Clones the regressors for stacking regression if True (default) or else uses the original ones, which will be refitted on the dataset upon calling the fit method. Setting refit=False is recommended if you are working with estimators that are supporting the scikit-learn fit/predict API interface but are not compatible to scikit-learn's clone function. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/regressor/StackingRegressor/","title":"StackingRegressor"},{"location":"api_subpackages/mlxtend.regressor/#methods_2","text":"fit(X, y, sample_weight=None) Learn weight coefficients from training data for each regressor. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] or [n_samples, n_targets] Target values. sample_weight : array-like, shape = [n_samples], optional Sample weights passed as sample_weights to each regressor in the regressors list as well as the meta_regressor. Raises error if some regressor does not support sample_weight in the fit() method. Returns self : object fit_transform(X, y=None, fit_params) Fit to data, then transform it. Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X. Parameters X : numpy array of shape [n_samples, n_features] Training set. y : numpy array of shape [n_samples] Target values. Returns X_new : numpy array of shape [n_samples, n_features_new] Transformed array. get_params(deep=True) Return estimator parameter names for GridSearch support. predict(X) Predict target values for X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns y_target : array-like, shape = [n_samples] or [n_samples, n_targets] Predicted target values. predict_meta_features(X) Get meta-features of test-data. Parameters X : numpy array, shape = [n_samples, n_features] Test vectors, where n_samples is the number of samples and n_features is the number of features. Returns meta-features : numpy array, shape = [n_samples, len(self.regressors)] meta-features for test data, where n_samples is the number of samples in test data and len(self.regressors) is the number of regressors. score(X, y, sample_weight=None) Returns the coefficient of determination R^2 of the prediction. The coefficient R^2 is defined as (1 - u/v), where u is the residual sum of squares ((y_true - y_pred) 2).sum() and v is the total sum of squares ((y_true - y_true.mean()) 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0. Parameters X : array-like, shape = (n_samples, n_features) Test samples. For some estimators this may be a precomputed kernel matrix instead, shape = (n_samples, n_samples_fitted], where n_samples_fitted is the number of samples used in the fitting for the estimator. y : array-like, shape = (n_samples) or (n_samples, n_outputs) True values for X. sample_weight : array-like, shape = [n_samples], optional Sample weights. Returns score : float R^2 of self.predict(X) wrt. y. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self","title":"Methods"},{"location":"api_subpackages/mlxtend.regressor/#properties","text":"coef_ None intercept_ None","title":"Properties"},{"location":"api_subpackages/mlxtend.text/","text":"mlxtend version: 0.14.0dev generalize_names generalize_names(name, output_sep=' ', firstname_output_letters=1) Generalize a person's first and last name. Returns a person's name in the format (all lowercase) Parameters name : str Name of the player output_sep : str (default: ' ') String for separating last name and first name in the output. firstname_output_letters : int Number of letters in the abbreviated first name. Returns gen_name : str The generalized name. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/text/generalize_names/ generalize_names_duplcheck generalize_names_duplcheck(df, col_name) Generalizes names and removes duplicates. Applies mlxtend.text.generalize_names to a DataFrame with 1 first name letter by default and uses more first name letters if duplicates are detected. Parameters df : pandas.DataFrame DataFrame that contains a column where generalize_names should be applied. col_name : str Name of the DataFrame column where generalize_names function should be applied to. Returns df_new : str New DataFrame object where generalize_names function has been applied without duplicates. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/text/generalize_names_duplcheck/ tokenizer_emoticons tokenizer_emoticons(text) Return emoticons from text Examples >>> tokenizer_emoticons('This :) is :( a test :-)!') [':)', ':(', ':-)'] For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/text/tokenizer_emoticons/ tokenizer_words_and_emoticons tokenizer_words_and_emoticons(text) Convert text to lowercase words and emoticons. Examples >>> tokenizer_words_and_emoticons('This :) is :( a test :-)!') ['this', 'is', 'a', 'test', ':)', ':(', ':-)'] For more usage examples, please see http://rasbt.github.io/mlxtend/user_guide/text/tokenizer_words_and_emoticons/","title":"Mlxtend.text"},{"location":"api_subpackages/mlxtend.text/#generalize_names","text":"generalize_names(name, output_sep=' ', firstname_output_letters=1) Generalize a person's first and last name. Returns a person's name in the format (all lowercase) Parameters name : str Name of the player output_sep : str (default: ' ') String for separating last name and first name in the output. firstname_output_letters : int Number of letters in the abbreviated first name. Returns gen_name : str The generalized name. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/text/generalize_names/","title":"generalize_names"},{"location":"api_subpackages/mlxtend.text/#generalize_names_duplcheck","text":"generalize_names_duplcheck(df, col_name) Generalizes names and removes duplicates. Applies mlxtend.text.generalize_names to a DataFrame with 1 first name letter by default and uses more first name letters if duplicates are detected. Parameters df : pandas.DataFrame DataFrame that contains a column where generalize_names should be applied. col_name : str Name of the DataFrame column where generalize_names function should be applied to. Returns df_new : str New DataFrame object where generalize_names function has been applied without duplicates. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/text/generalize_names_duplcheck/","title":"generalize_names_duplcheck"},{"location":"api_subpackages/mlxtend.text/#tokenizer_emoticons","text":"tokenizer_emoticons(text) Return emoticons from text Examples >>> tokenizer_emoticons('This :) is :( a test :-)!') [':)', ':(', ':-)'] For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/text/tokenizer_emoticons/","title":"tokenizer_emoticons"},{"location":"api_subpackages/mlxtend.text/#tokenizer_words_and_emoticons","text":"tokenizer_words_and_emoticons(text) Convert text to lowercase words and emoticons. Examples >>> tokenizer_words_and_emoticons('This :) is :( a test :-)!') ['this', 'is', 'a', 'test', ':)', ':(', ':-)'] For more usage examples, please see http://rasbt.github.io/mlxtend/user_guide/text/tokenizer_words_and_emoticons/","title":"tokenizer_words_and_emoticons"},{"location":"api_subpackages/mlxtend.utils/","text":"mlxtend version: 0.14.0dev Counter Counter(stderr=False, start_newline=True, precision=0, name=None) Class to display the progress of for-loop iterators. Parameters stderr : bool (default: True) Prints output to sys.stderr if True; uses sys.stdout otherwise. start_newline : bool (default: True) Prepends a new line to the counter, which prevents overwriting counters if multiple counters are printed in succession. precision: int (default: 0) Sets the number of decimal places when displaying the time elapsed in seconds. name : string (default: None) Prepends the specified name before the counter to allow distinguishing between multiple counters. Attributes curr_iter : int The current iteration. start_time : float The system's time in seconds when the Counter was initialized. end_time : float The system's time in seconds when the Counter was last updated. Examples >>> cnt = Counter() >>> for i in range(20): ... # do some computation ... time.sleep(0.1) ... cnt.update() 20 iter | 2 sec >>> print('The counter was initialized.' ' %d seconds ago.' % (time.time() - cnt.start_time)) The counter was initialized 2 seconds ago >>> print('The counter was last updated' ' %d seconds ago.' % (time.time() - cnt.end_time)) The counter was last updated 0 seconds ago. For more usage examples, please see http://rasbt.github.io/mlxtend/user_guide/utils/Counter/ Methods update() Print current iteration and time elapsed. assert_raises assert_raises(exception_type, message, func, args, * kwargs) Check that an exception is raised with a specific message Parameters exception_type : exception The exception that should be raised message : str (default: None) The error message that should be raised. Ignored if False or None. func : callable The function that raises the exception *args : positional arguments to func . **kwargs : keyword arguments to func check_Xy check_Xy(X, y, y_int=True) None format_kwarg_dictionaries format_kwarg_dictionaries(default_kwargs=None, user_kwargs=None, protected_keys=None) Function to combine default and user specified kwargs dictionaries Parameters default_kwargs : dict, optional Default kwargs (default is None). user_kwargs : dict, optional User specified kwargs (default is None). protected_keys : array_like, optional Sequence of keys to be removed from the returned dictionary (default is None). Returns formatted_kwargs : dict Formatted kwargs dictionary.","title":"Mlxtend.utils"},{"location":"api_subpackages/mlxtend.utils/#counter","text":"Counter(stderr=False, start_newline=True, precision=0, name=None) Class to display the progress of for-loop iterators. Parameters stderr : bool (default: True) Prints output to sys.stderr if True; uses sys.stdout otherwise. start_newline : bool (default: True) Prepends a new line to the counter, which prevents overwriting counters if multiple counters are printed in succession. precision: int (default: 0) Sets the number of decimal places when displaying the time elapsed in seconds. name : string (default: None) Prepends the specified name before the counter to allow distinguishing between multiple counters. Attributes curr_iter : int The current iteration. start_time : float The system's time in seconds when the Counter was initialized. end_time : float The system's time in seconds when the Counter was last updated. Examples >>> cnt = Counter() >>> for i in range(20): ... # do some computation ... time.sleep(0.1) ... cnt.update() 20 iter | 2 sec >>> print('The counter was initialized.' ' %d seconds ago.' % (time.time() - cnt.start_time)) The counter was initialized 2 seconds ago >>> print('The counter was last updated' ' %d seconds ago.' % (time.time() - cnt.end_time)) The counter was last updated 0 seconds ago. For more usage examples, please see http://rasbt.github.io/mlxtend/user_guide/utils/Counter/","title":"Counter"},{"location":"api_subpackages/mlxtend.utils/#methods","text":"update() Print current iteration and time elapsed.","title":"Methods"},{"location":"api_subpackages/mlxtend.utils/#assert_raises","text":"assert_raises(exception_type, message, func, args, * kwargs) Check that an exception is raised with a specific message Parameters exception_type : exception The exception that should be raised message : str (default: None) The error message that should be raised. Ignored if False or None. func : callable The function that raises the exception *args : positional arguments to func . **kwargs : keyword arguments to func","title":"assert_raises"},{"location":"api_subpackages/mlxtend.utils/#check_xy","text":"check_Xy(X, y, y_int=True) None","title":"check_Xy"},{"location":"api_subpackages/mlxtend.utils/#format_kwarg_dictionaries","text":"format_kwarg_dictionaries(default_kwargs=None, user_kwargs=None, protected_keys=None) Function to combine default and user specified kwargs dictionaries Parameters default_kwargs : dict, optional Default kwargs (default is None). user_kwargs : dict, optional User specified kwargs (default is None). protected_keys : array_like, optional Sequence of keys to be removed from the returned dictionary (default is None). Returns formatted_kwargs : dict Formatted kwargs dictionary.","title":"format_kwarg_dictionaries"},{"location":"user_guide/classifier/Adaline/","text":"Adaptive Linear Neuron -- Adaline An implementation of the ADAptive LInear NEuron, Adaline, for binary classification tasks. from mlxtend.classifier import Adaline Overview An illustration of the ADAptive LInear NEuron (Adaline) -- a single-layer artificial linear neuron with a threshold unit: The Adaline classifier is closely related to the Ordinary Least Squares (OLS) Linear Regression algorithm; in OLS regression we find the line (or hyperplane) that minimizes the vertical offsets. Or in other words, we define the best-fitting line as the line that minimizes the sum of squared errors (SSE) or mean squared error (MSE) between our target variable (y) and our predicted output over all samples i in our dataset of size n . SSE = \\sum_i (\\text{target}^{(i)} - \\text{output}^{(i)})^2 MSE = \\frac{1}{n} \\times SSE LinearRegression implements a linear regression model for performing ordinary least squares regression, and in Adaline, we add a threshold function g(\\cdot) to convert the continuous outcome to a categorical class label: $$y = g({z}) = \\begin{cases} 1 & \\text{if z $\\ge$ 0}\\\\ -1 & \\text{otherwise}. \\end{cases} $$ An Adaline model can be trained by one of the following three approaches: Normal Equations Gradient Descent Stochastic Gradient Descent Normal Equations (closed-form solution) The closed-form solution should be preferred for \"smaller\" datasets where calculating (a \"costly\") matrix inverse is not a concern. For very large datasets, or datasets where the inverse of [X^T X] may not exist (the matrix is non-invertible or singular, e.g., in case of perfect multicollinearity), the gradient descent or stochastic gradient descent approaches are to be preferred. The linear function (linear regression model) is defined as: z = w_0x_0 + w_1x_1 + ... + w_mx_m = \\sum_{j=0}^{m} w_j x_j = \\mathbf{w}^T\\mathbf{x} where y is the response variable, \\mathbf{x} is an m -dimensional sample vector, and \\mathbf{w} is the weight vector (vector of coefficients). Note that w_0 represents the y-axis intercept of the model and therefore x_0=1 . Using the closed-form solution (normal equation), we compute the weights of the model as follows: \\mathbf{w} = (\\mathbf{X}^T\\mathbf{X})^{-1}\\mathbf{X}^Ty Gradient Descent (GD) and Stochastic Gradient Descent (SGD) In the current implementation, the Adaline model is learned via Gradient Descent or Stochastic Gradient Descent. See Gradient Descent and Stochastic Gradient Descent and Deriving the Gradient Descent Rule for Linear Regression and Adaline for details. Random shuffling is implemented as: for one or more epochs randomly shuffle samples in the training set for training sample i compute gradients and perform weight updates References B. Widrow, M. E. Hoff, et al. Adaptive switching circuits . 1960. Example 1 - Closed Form Solution from mlxtend.data import iris_data from mlxtend.plotting import plot_decision_regions from mlxtend.classifier import Adaline import matplotlib.pyplot as plt # Loading Data X, y = iris_data() X = X[:, [0, 3]] # sepal length and petal width X = X[0:100] # class 0 and class 1 y = y[0:100] # class 0 and class 1 # standardize X[:,0] = (X[:,0] - X[:,0].mean()) / X[:,0].std() X[:,1] = (X[:,1] - X[:,1].mean()) / X[:,1].std() ada = Adaline(epochs=30, eta=0.01, minibatches=None, random_seed=1) ada.fit(X, y) plot_decision_regions(X, y, clf=ada) plt.title('Adaline - Stochastic Gradient Descent') plt.show() Example 2 - Gradient Descent from mlxtend.data import iris_data from mlxtend.plotting import plot_decision_regions from mlxtend.classifier import Adaline import matplotlib.pyplot as plt # Loading Data X, y = iris_data() X = X[:, [0, 3]] # sepal length and petal width X = X[0:100] # class 0 and class 1 y = y[0:100] # class 0 and class 1 # standardize X[:,0] = (X[:,0] - X[:,0].mean()) / X[:,0].std() X[:,1] = (X[:,1] - X[:,1].mean()) / X[:,1].std() ada = Adaline(epochs=30, eta=0.01, minibatches=1, # for Gradient Descent Learning random_seed=1, print_progress=3) ada.fit(X, y) plot_decision_regions(X, y, clf=ada) plt.title('Adaline - Stochastic Gradient Descent') plt.show() plt.plot(range(len(ada.cost_)), ada.cost_) plt.xlabel('Iterations') plt.ylabel('Cost') Iteration: 30/30 | Cost 3.79 | Elapsed: 0:00:00 | ETA: 0:00:00 Example 3 - Stochastic Gradient Descent from mlxtend.data import iris_data from mlxtend.plotting import plot_decision_regions from mlxtend.classifier import Adaline import matplotlib.pyplot as plt # Loading Data X, y = iris_data() X = X[:, [0, 3]] # sepal length and petal width X = X[0:100] # class 0 and class 1 y = y[0:100] # class 0 and class 1 # standardize X[:,0] = (X[:,0] - X[:,0].mean()) / X[:,0].std() X[:,1] = (X[:,1] - X[:,1].mean()) / X[:,1].std() ada = Adaline(epochs=15, eta=0.02, minibatches=len(y), # for SGD learning random_seed=1, print_progress=3) ada.fit(X, y) plot_decision_regions(X, y, clf=ada) plt.title('Adaline - Stochastic Gradient Descent') plt.show() plt.plot(range(len(ada.cost_)), ada.cost_) plt.xlabel('Iterations') plt.ylabel('Cost') plt.show() Iteration: 15/15 | Cost 3.81 | Elapsed: 0:00:00 | ETA: 0:00:00 Example 4 - Stochastic Gradient Descent with Minibatches from mlxtend.data import iris_data from mlxtend.plotting import plot_decision_regions from mlxtend.classifier import Adaline import matplotlib.pyplot as plt # Loading Data X, y = iris_data() X = X[:, [0, 3]] # sepal length and petal width X = X[0:100] # class 0 and class 1 y = y[0:100] # class 0 and class 1 # standardize X[:,0] = (X[:,0] - X[:,0].mean()) / X[:,0].std() X[:,1] = (X[:,1] - X[:,1].mean()) / X[:,1].std() ada = Adaline(epochs=15, eta=0.02, minibatches=5, # for SGD learning w. minibatch size 20 random_seed=1, print_progress=3) ada.fit(X, y) plot_decision_regions(X, y, clf=ada) plt.title('Adaline - Stochastic Gradient Descent') plt.show() plt.plot(range(len(ada.cost_)), ada.cost_) plt.xlabel('Iterations') plt.ylabel('Cost') plt.show() Iteration: 15/15 | Cost 3.87 | Elapsed: 0:00:00 | ETA: 0:00:00 API Adaline(eta=0.01, epochs=50, minibatches=None, random_seed=None, print_progress=0) ADAptive LInear NEuron classifier. Note that this implementation of Adaline expects binary class labels in {0, 1}. Parameters eta : float (default: 0.01) solver rate (between 0.0 and 1.0) epochs : int (default: 50) Passes over the training dataset. Prior to each epoch, the dataset is shuffled if minibatches > 1 to prevent cycles in stochastic gradient descent. minibatches : int (default: None) The number of minibatches for gradient-based optimization. If None: Normal Equations (closed-form solution) If 1: Gradient Descent learning If len(y): Stochastic Gradient Descent (SGD) online learning If 1 < minibatches < len(y): SGD Minibatch learning random_seed : int (default: None) Set random state for shuffling and initializing the weights. print_progress : int (default: 0) Prints progress in fitting to stderr if not solver='normal equation' 0: No output 1: Epochs elapsed and cost 2: 1 plus time elapsed 3: 2 plus estimated time until completion Attributes w_ : 2d-array, shape={n_features, 1} Model weights after fitting. b_ : 1d-array, shape={1,} Bias unit after fitting. cost_ : list Sum of squared errors after each epoch. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/classifier/Adaline/ Methods fit(X, y, init_params=True) Learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. init_params : bool (default: True) Re-initializes model parameters prior to fitting. Set False to continue training with weights from a previous model fitting. Returns self : object predict(X) Predict targets from X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns target_values : array-like, shape = [n_samples] Predicted target values. score(X, y) Compute the prediction accuracy Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values (true class labels). Returns acc : float The prediction accuracy as a float between 0.0 and 1.0 (perfect score).","title":"Adaptive Linear Neuron -- Adaline"},{"location":"user_guide/classifier/Adaline/#adaptive-linear-neuron-adaline","text":"An implementation of the ADAptive LInear NEuron, Adaline, for binary classification tasks. from mlxtend.classifier import Adaline","title":"Adaptive Linear Neuron -- Adaline"},{"location":"user_guide/classifier/Adaline/#overview","text":"An illustration of the ADAptive LInear NEuron (Adaline) -- a single-layer artificial linear neuron with a threshold unit: The Adaline classifier is closely related to the Ordinary Least Squares (OLS) Linear Regression algorithm; in OLS regression we find the line (or hyperplane) that minimizes the vertical offsets. Or in other words, we define the best-fitting line as the line that minimizes the sum of squared errors (SSE) or mean squared error (MSE) between our target variable (y) and our predicted output over all samples i in our dataset of size n . SSE = \\sum_i (\\text{target}^{(i)} - \\text{output}^{(i)})^2 MSE = \\frac{1}{n} \\times SSE LinearRegression implements a linear regression model for performing ordinary least squares regression, and in Adaline, we add a threshold function g(\\cdot) to convert the continuous outcome to a categorical class label: $$y = g({z}) = \\begin{cases} 1 & \\text{if z $\\ge$ 0}\\\\ -1 & \\text{otherwise}. \\end{cases} $$ An Adaline model can be trained by one of the following three approaches: Normal Equations Gradient Descent Stochastic Gradient Descent","title":"Overview"},{"location":"user_guide/classifier/Adaline/#normal-equations-closed-form-solution","text":"The closed-form solution should be preferred for \"smaller\" datasets where calculating (a \"costly\") matrix inverse is not a concern. For very large datasets, or datasets where the inverse of [X^T X] may not exist (the matrix is non-invertible or singular, e.g., in case of perfect multicollinearity), the gradient descent or stochastic gradient descent approaches are to be preferred. The linear function (linear regression model) is defined as: z = w_0x_0 + w_1x_1 + ... + w_mx_m = \\sum_{j=0}^{m} w_j x_j = \\mathbf{w}^T\\mathbf{x} where y is the response variable, \\mathbf{x} is an m -dimensional sample vector, and \\mathbf{w} is the weight vector (vector of coefficients). Note that w_0 represents the y-axis intercept of the model and therefore x_0=1 . Using the closed-form solution (normal equation), we compute the weights of the model as follows: \\mathbf{w} = (\\mathbf{X}^T\\mathbf{X})^{-1}\\mathbf{X}^Ty","title":"Normal Equations (closed-form solution)"},{"location":"user_guide/classifier/Adaline/#gradient-descent-gd-and-stochastic-gradient-descent-sgd","text":"In the current implementation, the Adaline model is learned via Gradient Descent or Stochastic Gradient Descent. See Gradient Descent and Stochastic Gradient Descent and Deriving the Gradient Descent Rule for Linear Regression and Adaline for details. Random shuffling is implemented as: for one or more epochs randomly shuffle samples in the training set for training sample i compute gradients and perform weight updates","title":"Gradient Descent (GD) and Stochastic Gradient Descent (SGD)"},{"location":"user_guide/classifier/Adaline/#references","text":"B. Widrow, M. E. Hoff, et al. Adaptive switching circuits . 1960.","title":"References"},{"location":"user_guide/classifier/Adaline/#example-1-closed-form-solution","text":"from mlxtend.data import iris_data from mlxtend.plotting import plot_decision_regions from mlxtend.classifier import Adaline import matplotlib.pyplot as plt # Loading Data X, y = iris_data() X = X[:, [0, 3]] # sepal length and petal width X = X[0:100] # class 0 and class 1 y = y[0:100] # class 0 and class 1 # standardize X[:,0] = (X[:,0] - X[:,0].mean()) / X[:,0].std() X[:,1] = (X[:,1] - X[:,1].mean()) / X[:,1].std() ada = Adaline(epochs=30, eta=0.01, minibatches=None, random_seed=1) ada.fit(X, y) plot_decision_regions(X, y, clf=ada) plt.title('Adaline - Stochastic Gradient Descent') plt.show()","title":"Example 1 - Closed Form Solution"},{"location":"user_guide/classifier/Adaline/#example-2-gradient-descent","text":"from mlxtend.data import iris_data from mlxtend.plotting import plot_decision_regions from mlxtend.classifier import Adaline import matplotlib.pyplot as plt # Loading Data X, y = iris_data() X = X[:, [0, 3]] # sepal length and petal width X = X[0:100] # class 0 and class 1 y = y[0:100] # class 0 and class 1 # standardize X[:,0] = (X[:,0] - X[:,0].mean()) / X[:,0].std() X[:,1] = (X[:,1] - X[:,1].mean()) / X[:,1].std() ada = Adaline(epochs=30, eta=0.01, minibatches=1, # for Gradient Descent Learning random_seed=1, print_progress=3) ada.fit(X, y) plot_decision_regions(X, y, clf=ada) plt.title('Adaline - Stochastic Gradient Descent') plt.show() plt.plot(range(len(ada.cost_)), ada.cost_) plt.xlabel('Iterations') plt.ylabel('Cost') Iteration: 30/30 | Cost 3.79 | Elapsed: 0:00:00 | ETA: 0:00:00 ","title":"Example 2 - Gradient Descent"},{"location":"user_guide/classifier/Adaline/#example-3-stochastic-gradient-descent","text":"from mlxtend.data import iris_data from mlxtend.plotting import plot_decision_regions from mlxtend.classifier import Adaline import matplotlib.pyplot as plt # Loading Data X, y = iris_data() X = X[:, [0, 3]] # sepal length and petal width X = X[0:100] # class 0 and class 1 y = y[0:100] # class 0 and class 1 # standardize X[:,0] = (X[:,0] - X[:,0].mean()) / X[:,0].std() X[:,1] = (X[:,1] - X[:,1].mean()) / X[:,1].std() ada = Adaline(epochs=15, eta=0.02, minibatches=len(y), # for SGD learning random_seed=1, print_progress=3) ada.fit(X, y) plot_decision_regions(X, y, clf=ada) plt.title('Adaline - Stochastic Gradient Descent') plt.show() plt.plot(range(len(ada.cost_)), ada.cost_) plt.xlabel('Iterations') plt.ylabel('Cost') plt.show() Iteration: 15/15 | Cost 3.81 | Elapsed: 0:00:00 | ETA: 0:00:00","title":"Example 3 - Stochastic Gradient Descent"},{"location":"user_guide/classifier/Adaline/#example-4-stochastic-gradient-descent-with-minibatches","text":"from mlxtend.data import iris_data from mlxtend.plotting import plot_decision_regions from mlxtend.classifier import Adaline import matplotlib.pyplot as plt # Loading Data X, y = iris_data() X = X[:, [0, 3]] # sepal length and petal width X = X[0:100] # class 0 and class 1 y = y[0:100] # class 0 and class 1 # standardize X[:,0] = (X[:,0] - X[:,0].mean()) / X[:,0].std() X[:,1] = (X[:,1] - X[:,1].mean()) / X[:,1].std() ada = Adaline(epochs=15, eta=0.02, minibatches=5, # for SGD learning w. minibatch size 20 random_seed=1, print_progress=3) ada.fit(X, y) plot_decision_regions(X, y, clf=ada) plt.title('Adaline - Stochastic Gradient Descent') plt.show() plt.plot(range(len(ada.cost_)), ada.cost_) plt.xlabel('Iterations') plt.ylabel('Cost') plt.show() Iteration: 15/15 | Cost 3.87 | Elapsed: 0:00:00 | ETA: 0:00:00","title":"Example 4 - Stochastic Gradient Descent with Minibatches"},{"location":"user_guide/classifier/Adaline/#api","text":"Adaline(eta=0.01, epochs=50, minibatches=None, random_seed=None, print_progress=0) ADAptive LInear NEuron classifier. Note that this implementation of Adaline expects binary class labels in {0, 1}. Parameters eta : float (default: 0.01) solver rate (between 0.0 and 1.0) epochs : int (default: 50) Passes over the training dataset. Prior to each epoch, the dataset is shuffled if minibatches > 1 to prevent cycles in stochastic gradient descent. minibatches : int (default: None) The number of minibatches for gradient-based optimization. If None: Normal Equations (closed-form solution) If 1: Gradient Descent learning If len(y): Stochastic Gradient Descent (SGD) online learning If 1 < minibatches < len(y): SGD Minibatch learning random_seed : int (default: None) Set random state for shuffling and initializing the weights. print_progress : int (default: 0) Prints progress in fitting to stderr if not solver='normal equation' 0: No output 1: Epochs elapsed and cost 2: 1 plus time elapsed 3: 2 plus estimated time until completion Attributes w_ : 2d-array, shape={n_features, 1} Model weights after fitting. b_ : 1d-array, shape={1,} Bias unit after fitting. cost_ : list Sum of squared errors after each epoch. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/classifier/Adaline/","title":"API"},{"location":"user_guide/classifier/Adaline/#methods","text":"fit(X, y, init_params=True) Learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. init_params : bool (default: True) Re-initializes model parameters prior to fitting. Set False to continue training with weights from a previous model fitting. Returns self : object predict(X) Predict targets from X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns target_values : array-like, shape = [n_samples] Predicted target values. score(X, y) Compute the prediction accuracy Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values (true class labels). Returns acc : float The prediction accuracy as a float between 0.0 and 1.0 (perfect score).","title":"Methods"},{"location":"user_guide/classifier/EnsembleVoteClassifier/","text":"EnsembleVoteClassifier Implementation of a majority voting EnsembleVoteClassifier for classification. from mlxtend.classifier import EnsembleVoteClassifier Overview The EnsembleVoteClassifier is a meta-classifier for combining similar or conceptually different machine learning classifiers for classification via majority or plurality voting. (For simplicity, we will refer to both majority and plurality voting as majority voting.) The EnsembleVoteClassifier implements \"hard\" and \"soft\" voting. In hard voting, we predict the final class label as the class label that has been predicted most frequently by the classification models. In soft voting, we predict the class labels by averaging the class-probabilities (only recommended if the classifiers are well-calibrated). Note If you are interested in using the EnsembleVoteClassifier , please note that it is now also available through scikit learn (>0.17) as VotingClassifier . Majority Voting / Hard Voting Hard voting is the simplest case of majority voting. Here, we predict the class label \\hat{y} via majority (plurality) voting of each classifier C_j : \\hat{y}=mode\\{C_1(\\mathbf{x}), C_2(\\mathbf{x}), ..., C_m(\\mathbf{x})\\} Assuming that we combine three classifiers that classify a training sample as follows: classifier 1 -> class 0 classifier 2 -> class 0 classifier 3 -> class 1 \\hat{y}=mode\\{0, 0, 1\\} = 0 Via majority vote, we would we would classify the sample as \"class 0.\" Weighted Majority Vote In addition to the simple majority vote (hard voting) as described in the previous section, we can compute a weighted majority vote by associating a weight w_j with classifier C_j : \\hat{y} = \\arg \\max_i \\sum^{m}_{j=1} w_j \\chi_A \\big(C_j(\\mathbf{x})=i\\big), where \\chi_A is the characteristic function [C_j(\\mathbf{x}) = i \\; \\in A] , and A is the set of unique class labels. Continuing with the example from the previous section classifier 1 -> class 0 classifier 2 -> class 0 classifier 3 -> class 1 assigning the weights {0.2, 0.2, 0.6} would yield a prediction \\hat{y} = 1 : \\arg \\max_i [0.2 \\times i_0 + 0.2 \\times i_0 + 0.6 \\times i_1] = 1 Soft Voting In soft voting, we predict the class labels based on the predicted probabilities p for classifier -- this approach is only recommended if the classifiers are well-calibrated. \\hat{y} = \\arg \\max_i \\sum^{m}_{j=1} w_j p_{ij}, where w_j is the weight that can be assigned to the j th classifier. Assuming the example in the previous section was a binary classification task with class labels i \\in \\{0, 1\\} , our ensemble could make the following prediction: C_1(\\mathbf{x}) \\rightarrow [0.9, 0.1] C_2(\\mathbf{x}) \\rightarrow [0.8, 0.2] C_3(\\mathbf{x}) \\rightarrow [0.4, 0.6] Using uniform weights, we compute the average probabilities: p(i_0 \\mid \\mathbf{x}) = \\frac{0.9 + 0.8 + 0.4}{3} = 0.7 \\\\\\\\ p(i_1 \\mid \\mathbf{x}) = \\frac{0.1 + 0.2 + 0.6}{3} = 0.3 \\hat{y} = \\arg \\max_i \\big[p(i_0 \\mid \\mathbf{x}), p(i_1 \\mid \\mathbf{x}) \\big] = 0 However, assigning the weights {0.1, 0.1, 0.8} would yield a prediction \\hat{y} = 1 : p(i_0 \\mid \\mathbf{x}) = {0.1 \\times 0.9 + 0.1 \\times 0.8 + 0.8 \\times 0.4} = 0.49 \\\\\\\\ p(i_1 \\mid \\mathbf{x}) = {0.1 \\times 0.1 + 0.2 \\times 0.1 + 0.8 \\times 0.6} = 0.51 \\hat{y} = \\arg \\max_i \\big[p(i_0 \\mid \\mathbf{x}), p(i_1 \\mid \\mathbf{x}) \\big] = 1 References [1] S. Raschka. Python Machine Learning . Packt Publishing Ltd., 2015. Example 1 - Classifying Iris Flowers Using Different Classification Models from sklearn import datasets iris = datasets.load_iris() X, y = iris.data[:, 1:3], iris.target from sklearn import model_selection from sklearn.linear_model import LogisticRegression from sklearn.naive_bayes import GaussianNB from sklearn.ensemble import RandomForestClassifier import numpy as np clf1 = LogisticRegression(random_state=1) clf2 = RandomForestClassifier(random_state=1) clf3 = GaussianNB() print('5-fold cross validation:\\n') labels = ['Logistic Regression', 'Random Forest', 'Naive Bayes'] for clf, label in zip([clf1, clf2, clf3], labels): scores = model_selection.cross_val_score(clf, X, y, cv=5, scoring='accuracy') print(\"Accuracy: %0.2f (+/- %0.2f) [%s]\" % (scores.mean(), scores.std(), label)) 5-fold cross validation: Accuracy: 0.90 (+/- 0.05) [Logistic Regression] Accuracy: 0.93 (+/- 0.05) [Random Forest] Accuracy: 0.91 (+/- 0.04) [Naive Bayes] from mlxtend.classifier import EnsembleVoteClassifier eclf = EnsembleVoteClassifier(clfs=[clf1, clf2, clf3], weights=[1,1,1]) labels = ['Logistic Regression', 'Random Forest', 'Naive Bayes', 'Ensemble'] for clf, label in zip([clf1, clf2, clf3, eclf], labels): scores = model_selection.cross_val_score(clf, X, y, cv=5, scoring='accuracy') print(\"Accuracy: %0.2f (+/- %0.2f) [%s]\" % (scores.mean(), scores.std(), label)) Accuracy: 0.90 (+/- 0.05) [Logistic Regression] Accuracy: 0.93 (+/- 0.05) [Random Forest] Accuracy: 0.91 (+/- 0.04) [Naive Bayes] Accuracy: 0.95 (+/- 0.05) [Ensemble] Plotting Decision Regions import matplotlib.pyplot as plt from mlxtend.plotting import plot_decision_regions import matplotlib.gridspec as gridspec import itertools gs = gridspec.GridSpec(2, 2) fig = plt.figure(figsize=(10,8)) labels = ['Logistic Regression', 'Random Forest', 'Naive Bayes', 'Ensemble'] for clf, lab, grd in zip([clf1, clf2, clf3, eclf], labels, itertools.product([0, 1], repeat=2)): clf.fit(X, y) ax = plt.subplot(gs[grd[0], grd[1]]) fig = plot_decision_regions(X=X, y=y, clf=clf) plt.title(lab) import matplotlib.pyplot as plt from mlxtend.plotting import plot_decision_regions import matplotlib.gridspec as gridspec import itertools gs = gridspec.GridSpec(2, 2) fig = plt.figure(figsize=(10,8)) labels = ['Logistic Regression', 'Random Forest', 'Naive Bayes', 'Ensemble'] for clf, lab, grd in zip([clf1, clf2, clf3, eclf], labels, itertools.product([0, 1], repeat=2)): clf.fit(X, y) ax = plt.subplot(gs[grd[0], grd[1]]) fig = plot_decision_regions(X=X, y=y, clf=clf) plt.title(lab) Example 2 - Grid Search from sklearn import datasets iris = datasets.load_iris() X, y = iris.data[:, 1:3], iris.target from sklearn.model_selection import GridSearchCV from sklearn.linear_model import LogisticRegression from sklearn.naive_bayes import GaussianNB from sklearn.ensemble import RandomForestClassifier from mlxtend.classifier import EnsembleVoteClassifier clf1 = LogisticRegression(random_state=1) clf2 = RandomForestClassifier(random_state=1) clf3 = GaussianNB() eclf = EnsembleVoteClassifier(clfs=[clf1, clf2, clf3], voting='soft') params = {'logisticregression__C': [1.0, 100.0], 'randomforestclassifier__n_estimators': [20, 200],} grid = GridSearchCV(estimator=eclf, param_grid=params, cv=5) grid.fit(iris.data, iris.target) cv_keys = ('mean_test_score', 'std_test_score', 'params') for r, _ in enumerate(grid.cv_results_['mean_test_score']): print(\"%0.3f +/- %0.2f %r\" % (grid.cv_results_[cv_keys[0]][r], grid.cv_results_[cv_keys[1]][r] / 2.0, grid.cv_results_[cv_keys[2]][r])) 0.953 +/- 0.01 {'logisticregression__C': 1.0, 'randomforestclassifier__n_estimators': 20} 0.960 +/- 0.01 {'logisticregression__C': 1.0, 'randomforestclassifier__n_estimators': 200} 0.960 +/- 0.01 {'logisticregression__C': 100.0, 'randomforestclassifier__n_estimators': 20} 0.953 +/- 0.02 {'logisticregression__C': 100.0, 'randomforestclassifier__n_estimators': 200} Note : If the EnsembleClassifier is initialized with multiple similar estimator objects, the estimator names are modified with consecutive integer indices, for example: clf1 = LogisticRegression(random_state=1) clf2 = RandomForestClassifier(random_state=1) eclf = EnsembleVoteClassifier(clfs=[clf1, clf1, clf2], voting='soft') params = {'logisticregression-1__C': [1.0, 100.0], 'logisticregression-2__C': [1.0, 100.0], 'randomforestclassifier__n_estimators': [20, 200],} grid = GridSearchCV(estimator=eclf, param_grid=params, cv=5) grid = grid.fit(iris.data, iris.target) Note The EnsembleVoteClass also enables grid search over the clfs argument. However, due to the current implementation of GridSearchCV in scikit-learn, it is not possible to search over both, differenct classifiers and classifier parameters at the same time. For instance, while the following parameter dictionary works params = {'randomforestclassifier__n_estimators': [1, 100], 'clfs': [(clf1, clf1, clf1), (clf2, clf3)]} it will use the instance settings of clf1 , clf2 , and clf3 and not overwrite it with the 'n_estimators' settings from 'randomforestclassifier__n_estimators': [1, 100] . Example 3 - Majority voting with classifiers trained on different feature subsets Feature selection algorithms implemented in scikit-learn as well as the SequentialFeatureSelector implement a transform method that passes the reduced feature subset to the next item in a Pipeline . For example, the method def transform(self, X): return X[:, self.k_feature_idx_] returns the best feature columns, k_feature_idx_ , given a dataset X. Thus, we simply need to construct a Pipeline consisting of the feature selector and the classifier in order to select different feature subsets for different algorithms. During fitting , the optimal feature subsets are automatically determined via the GridSearchCV object, and by calling predict , the fitted feature selector in the pipeline only passes these columns along, which resulted in the best performance for the respective classifier. from sklearn import datasets iris = datasets.load_iris() X, y = iris.data[:, :], iris.target from sklearn.model_selection import GridSearchCV from sklearn.linear_model import LogisticRegression from sklearn.naive_bayes import GaussianNB from sklearn.ensemble import RandomForestClassifier from mlxtend.classifier import EnsembleVoteClassifier from sklearn.pipeline import Pipeline from mlxtend.feature_selection import SequentialFeatureSelector clf1 = LogisticRegression(random_state=1) clf2 = RandomForestClassifier(random_state=1) clf3 = GaussianNB() # Creating a feature-selection-classifier pipeline sfs1 = SequentialFeatureSelector(clf1, k_features=4, forward=True, floating=False, scoring='accuracy', verbose=0, cv=0) clf1_pipe = Pipeline([('sfs', sfs1), ('logreg', clf1)]) eclf = EnsembleVoteClassifier(clfs=[clf1_pipe, clf2, clf3], voting='soft') params = {'pipeline__sfs__k_features': [1, 2, 3], 'pipeline__logreg__C': [1.0, 100.0], 'randomforestclassifier__n_estimators': [20, 200]} grid = GridSearchCV(estimator=eclf, param_grid=params, cv=5) grid.fit(iris.data, iris.target) cv_keys = ('mean_test_score', 'std_test_score', 'params') for r, _ in enumerate(grid.cv_results_['mean_test_score']): print(\"%0.3f +/- %0.2f %r\" % (grid.cv_results_[cv_keys[0]][r], grid.cv_results_[cv_keys[1]][r] / 2.0, grid.cv_results_[cv_keys[2]][r])) 0.953 +/- 0.01 {'pipeline__logreg__C': 1.0, 'pipeline__sfs__k_features': 1, 'randomforestclassifier__n_estimators': 20} 0.947 +/- 0.02 {'pipeline__logreg__C': 1.0, 'pipeline__sfs__k_features': 1, 'randomforestclassifier__n_estimators': 200} 0.953 +/- 0.01 {'pipeline__logreg__C': 1.0, 'pipeline__sfs__k_features': 2, 'randomforestclassifier__n_estimators': 20} 0.947 +/- 0.02 {'pipeline__logreg__C': 1.0, 'pipeline__sfs__k_features': 2, 'randomforestclassifier__n_estimators': 200} 0.953 +/- 0.01 {'pipeline__logreg__C': 1.0, 'pipeline__sfs__k_features': 3, 'randomforestclassifier__n_estimators': 20} 0.953 +/- 0.02 {'pipeline__logreg__C': 1.0, 'pipeline__sfs__k_features': 3, 'randomforestclassifier__n_estimators': 200} 0.947 +/- 0.02 {'pipeline__logreg__C': 100.0, 'pipeline__sfs__k_features': 1, 'randomforestclassifier__n_estimators': 20} 0.953 +/- 0.02 {'pipeline__logreg__C': 100.0, 'pipeline__sfs__k_features': 1, 'randomforestclassifier__n_estimators': 200} 0.947 +/- 0.02 {'pipeline__logreg__C': 100.0, 'pipeline__sfs__k_features': 2, 'randomforestclassifier__n_estimators': 20} 0.947 +/- 0.02 {'pipeline__logreg__C': 100.0, 'pipeline__sfs__k_features': 2, 'randomforestclassifier__n_estimators': 200} 0.960 +/- 0.01 {'pipeline__logreg__C': 100.0, 'pipeline__sfs__k_features': 3, 'randomforestclassifier__n_estimators': 20} 0.953 +/- 0.02 {'pipeline__logreg__C': 100.0, 'pipeline__sfs__k_features': 3, 'randomforestclassifier__n_estimators': 200} The best parameters determined via GridSearch are: grid.best_params_ {'pipeline__logreg__C': 100.0, 'pipeline__sfs__k_features': 3, 'randomforestclassifier__n_estimators': 20} Now, we assign these parameters to the ensemble voting classifier, fit the models on the complete training set, and perform a prediction on 3 samples from the Iris dataset. eclf = eclf.set_params(**grid.best_params_) eclf.fit(X, y).predict(X[[1, 51, 149]]) array([0, 1, 2]) Manual Approach Alternatively, we can select different columns \"manually\" using the ColumnSelector object. In this example, we select only the first (sepal length) and third (petal length) column for the logistic regression classifier ( clf1 ). from mlxtend.feature_selection import ColumnSelector col_sel = ColumnSelector(cols=[0, 2]) clf1_pipe = Pipeline([('sel', col_sel), ('logreg', clf1)]) eclf = EnsembleVoteClassifier(clfs=[clf1_pipe, clf2, clf3], voting='soft') eclf.fit(X, y).predict(X[[1, 51, 149]]) array([0, 1, 2]) Furthermore, we can fit the SequentialFeatureSelector separately, outside the grid search hyperparameter optimization pipeline. Here, we determine the best features first, and then we construct a pipeline using these \"fixed,\" best features as seed for the ColumnSelector : sfs1 = SequentialFeatureSelector(clf1, k_features=2, forward=True, floating=False, scoring='accuracy', verbose=1, cv=0) sfs1.fit(X, y) print('Best features', sfs1.k_feature_idx_) col_sel = ColumnSelector(cols=sfs1.k_feature_idx_) clf1_pipe = Pipeline([('sel', col_sel), ('logreg', clf1)]) [Parallel(n_jobs=1)]: Done 4 out of 4 | elapsed: 0.0s finished Features: 1/2[Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 0.0s finished Features: 2/2 Best features (0, 2) eclf = EnsembleVoteClassifier(clfs=[clf1_pipe, clf2, clf3], voting='soft') eclf.fit(X, y).predict(X[[1, 51, 149]]) array([0, 1, 2]) Example 5 - Using Pre-fitted Classifiers from sklearn import datasets iris = datasets.load_iris() X, y = iris.data[:, 1:3], iris.target Assume that we previously fitted our classifiers: from sklearn import model_selection from sklearn.linear_model import LogisticRegression from sklearn.naive_bayes import GaussianNB from sklearn.ensemble import RandomForestClassifier import numpy as np clf1 = LogisticRegression(random_state=1) clf2 = RandomForestClassifier(random_state=1) clf3 = GaussianNB() for clf in (clf1, clf2, clf3): clf.fit(X, y) By setting refit=False , the EnsembleVoteClassifier will not re-fit these classifers to save computational time: from mlxtend.classifier import EnsembleVoteClassifier import copy eclf = EnsembleVoteClassifier(clfs=[clf1, clf2, clf3], weights=[1,1,1], refit=False) labels = ['Logistic Regression', 'Random Forest', 'Naive Bayes', 'Ensemble'] eclf.fit(X, y) print('accuracy:', np.mean(y == eclf.predict(X))) accuracy: 0.973333333333 However, please note that refit=False is incompatible to any form of cross-validation that is done in e.g., model_selection.cross_val_score or model_selection.GridSearchCV , etc., since it would require the classifiers to be refit to the training folds. Thus, only use refit=False if you want to make a prediction directly without cross-validation. Example 6 - Ensembles of Classifiers that Operate on Different Feature Subsets If desired, the different classifiers can be fit to different subsets of features in the training dataset. The following example illustrates how this can be done on a technical level using scikit-learn pipelines and the ColumnSelector : from sklearn.datasets import load_iris from mlxtend.classifier import EnsembleVoteClassifier from mlxtend.feature_selection import ColumnSelector from sklearn.pipeline import make_pipeline from sklearn.linear_model import LogisticRegression iris = load_iris() X = iris.data y = iris.target pipe1 = make_pipeline(ColumnSelector(cols=(0, 2)), LogisticRegression()) pipe2 = make_pipeline(ColumnSelector(cols=(1, 2, 3)), LogisticRegression()) eclf = EnsembleVoteClassifier(clfs=[pipe1, pipe2]) eclf.fit(X, y) EnsembleVoteClassifier(clfs=[Pipeline(memory=None, steps=[('columnselector', ColumnSelector(cols=(0, 2), drop_axis=False)), ('logisticregression', LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True, intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1, penalty='l...='l2', random_state=None, solver='liblinear', tol=0.0001, verbose=0, warm_start=False))])], refit=True, verbose=0, voting='hard', weights=None) Example 7 - A Note about Scikit-Learn SVMs and Soft Voting This section provides some additional technical insights in how probabilities are used when voting='soft' . Note that scikit-learn estimates the probabilities for SVMs (more info here: http://scikit-learn.org/stable/modules/svm.html#scores-probabilities) in a way that these may not be consistent with the class labels that the SVM predicts. This is an extreme example, but let's say we have a dataset with 3 class labels, 0, 1, and 2. For a given training example, the SVM classifier may predict class 2. However, the class-membership probabilities may look as follows: class 0: 99% class 1: 0.5% class 2: 0.5% A practical example of this scenario is shown below: import numpy as np from mlxtend.classifier import EnsembleVoteClassifier from sklearn.svm import SVC from sklearn.datasets import load_iris iris = load_iris() X, y = iris.data, iris.target clf2 = SVC(probability=True, random_state=4) clf2.fit(X, y) eclf = EnsembleVoteClassifier(clfs=[clf2], voting='soft', refit=False) eclf.fit(X, y) for svm_class, e_class, svm_prob, e_prob, in zip(clf2.predict(X), eclf.predict(X), clf2.predict_proba(X), eclf.predict_proba(X)): if svm_class != e_class: print('============') print('Probas from SVM :', svm_prob) print('Class from SVM :', svm_class) print('Probas from SVM in Ensemble:', e_prob) print('Class from SVM in Ensemble :', e_class) print('============') ============ Probas from SVM : [ 0.01192489 0.47662663 0.51144848] Class from SVM : 1 Probas from SVM in Ensemble: [ 0.01192489 0.47662663 0.51144848] Class from SVM in Ensemble : 2 ============ Based on the probabilities, we would expect the SVM to predict class 2, because it has the highest probability. Since the EnsembleVoteClassifier uses the argmax function internally if voting='soft' , it would indeed predict class 2 in this case even if the ensemble consists of only one SVM model. Note that in practice, this minor technical detail does not need to concern you, but it is useful to keep it in mind in case you are wondering about results from a 1-model SVM ensemble compared to that SVM alone -- this is not a bug. API EnsembleVoteClassifier(clfs, voting='hard', weights=None, verbose=0, refit=True) Soft Voting/Majority Rule classifier for scikit-learn estimators. Parameters clfs : array-like, shape = [n_classifiers] A list of classifiers. Invoking the fit method on the VotingClassifier will fit clones of those original classifiers that will be stored in the class attribute self.clfs_ if refit=True (default). voting : str, {'hard', 'soft'} (default='hard') If 'hard', uses predicted class labels for majority rule voting. Else if 'soft', predicts the class label based on the argmax of the sums of the predicted probalities, which is recommended for an ensemble of well-calibrated classifiers. weights : array-like, shape = [n_classifiers], optional (default= None ) Sequence of weights ( float or int ) to weight the occurances of predicted class labels ( hard voting) or class probabilities before averaging ( soft voting). Uses uniform weights if None . verbose : int, optional (default=0) Controls the verbosity of the building process. - verbose=0 (default): Prints nothing - verbose=1 : Prints the number & name of the clf being fitted - verbose=2 : Prints info about the parameters of the clf being fitted - verbose>2 : Changes verbose param of the underlying clf to self.verbose - 2 refit : bool (default: True) Refits classifiers in clfs if True; uses references to the clfs , otherwise (assumes that the classifiers were already fit). Note: refit=False is incompatible to mist scikit-learn wrappers! For instance, if any form of cross-validation is performed this would require the re-fitting classifiers to training folds, which would raise a NotFitterError if refit=False. (New in mlxtend v0.6.) Attributes classes_ : array-like, shape = [n_predictions] clf : array-like, shape = [n_predictions] The unmodified input classifiers clf_ : array-like, shape = [n_predictions] Fitted clones of the input classifiers Examples >>> import numpy as np >>> from sklearn.linear_model import LogisticRegression >>> from sklearn.naive_bayes import GaussianNB >>> from sklearn.ensemble import RandomForestClassifier >>> from mlxtend.sklearn import EnsembleVoteClassifier >>> clf1 = LogisticRegression(random_seed=1) >>> clf2 = RandomForestClassifier(random_seed=1) >>> clf3 = GaussianNB() >>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]]) >>> y = np.array([1, 1, 1, 2, 2, 2]) >>> eclf1 = EnsembleVoteClassifier(clfs=[clf1, clf2, clf3], ... voting='hard', verbose=1) >>> eclf1 = eclf1.fit(X, y) >>> print(eclf1.predict(X)) [1 1 1 2 2 2] >>> eclf2 = EnsembleVoteClassifier(clfs=[clf1, clf2, clf3], voting='soft') >>> eclf2 = eclf2.fit(X, y) >>> print(eclf2.predict(X)) [1 1 1 2 2 2] >>> eclf3 = EnsembleVoteClassifier(clfs=[clf1, clf2, clf3], ... voting='soft', weights=[2,1,1]) >>> eclf3 = eclf3.fit(X, y) >>> print(eclf3.predict(X)) [1 1 1 2 2 2] >>> For more usage examples, please see http://rasbt.github.io/mlxtend/user_guide/classifier/EnsembleVoteClassifier/ Methods fit(X, y, sample_weight=None) Learn weight coefficients from training data for each classifier. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. sample_weight : array-like, shape = [n_samples], optional Sample weights passed as sample_weights to each regressor in the regressors list as well as the meta_regressor. Raises error if some regressor does not support sample_weight in the fit() method. Returns self : object fit_transform(X, y=None, fit_params) Fit to data, then transform it. Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X. Parameters X : numpy array of shape [n_samples, n_features] Training set. y : numpy array of shape [n_samples] Target values. Returns X_new : numpy array of shape [n_samples, n_features_new] Transformed array. get_params(deep=True) Return estimator parameter names for GridSearch support. predict(X) Predict class labels for X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns maj : array-like, shape = [n_samples] Predicted class labels. predict_proba(X) Predict class probabilities for X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns avg : array-like, shape = [n_samples, n_classes] Weighted average probability for each class per sample. score(X, y, sample_weight=None) Returns the mean accuracy on the given test data and labels. In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted. Parameters X : array-like, shape = (n_samples, n_features) Test samples. y : array-like, shape = (n_samples) or (n_samples, n_outputs) True labels for X. sample_weight : array-like, shape = [n_samples], optional Sample weights. Returns score : float Mean accuracy of self.predict(X) wrt. y. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self transform(X) Return class labels or probabilities for X for each estimator. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns If voting='soft'`` : array-like = [n_classifiers, n_samples, n_classes] Class probabilties calculated by each classifier. If voting='hard'`` : array-like = [n_classifiers, n_samples] Class labels predicted by each classifier.","title":"EnsembleVoteClassifier"},{"location":"user_guide/classifier/EnsembleVoteClassifier/#ensemblevoteclassifier","text":"Implementation of a majority voting EnsembleVoteClassifier for classification. from mlxtend.classifier import EnsembleVoteClassifier","title":"EnsembleVoteClassifier"},{"location":"user_guide/classifier/EnsembleVoteClassifier/#overview","text":"The EnsembleVoteClassifier is a meta-classifier for combining similar or conceptually different machine learning classifiers for classification via majority or plurality voting. (For simplicity, we will refer to both majority and plurality voting as majority voting.) The EnsembleVoteClassifier implements \"hard\" and \"soft\" voting. In hard voting, we predict the final class label as the class label that has been predicted most frequently by the classification models. In soft voting, we predict the class labels by averaging the class-probabilities (only recommended if the classifiers are well-calibrated). Note If you are interested in using the EnsembleVoteClassifier , please note that it is now also available through scikit learn (>0.17) as VotingClassifier .","title":"Overview"},{"location":"user_guide/classifier/EnsembleVoteClassifier/#majority-voting-hard-voting","text":"Hard voting is the simplest case of majority voting. Here, we predict the class label \\hat{y} via majority (plurality) voting of each classifier C_j : \\hat{y}=mode\\{C_1(\\mathbf{x}), C_2(\\mathbf{x}), ..., C_m(\\mathbf{x})\\} Assuming that we combine three classifiers that classify a training sample as follows: classifier 1 -> class 0 classifier 2 -> class 0 classifier 3 -> class 1 \\hat{y}=mode\\{0, 0, 1\\} = 0 Via majority vote, we would we would classify the sample as \"class 0.\"","title":"Majority Voting / Hard Voting"},{"location":"user_guide/classifier/EnsembleVoteClassifier/#weighted-majority-vote","text":"In addition to the simple majority vote (hard voting) as described in the previous section, we can compute a weighted majority vote by associating a weight w_j with classifier C_j : \\hat{y} = \\arg \\max_i \\sum^{m}_{j=1} w_j \\chi_A \\big(C_j(\\mathbf{x})=i\\big), where \\chi_A is the characteristic function [C_j(\\mathbf{x}) = i \\; \\in A] , and A is the set of unique class labels. Continuing with the example from the previous section classifier 1 -> class 0 classifier 2 -> class 0 classifier 3 -> class 1 assigning the weights {0.2, 0.2, 0.6} would yield a prediction \\hat{y} = 1 : \\arg \\max_i [0.2 \\times i_0 + 0.2 \\times i_0 + 0.6 \\times i_1] = 1","title":"Weighted Majority Vote"},{"location":"user_guide/classifier/EnsembleVoteClassifier/#soft-voting","text":"In soft voting, we predict the class labels based on the predicted probabilities p for classifier -- this approach is only recommended if the classifiers are well-calibrated. \\hat{y} = \\arg \\max_i \\sum^{m}_{j=1} w_j p_{ij}, where w_j is the weight that can be assigned to the j th classifier. Assuming the example in the previous section was a binary classification task with class labels i \\in \\{0, 1\\} , our ensemble could make the following prediction: C_1(\\mathbf{x}) \\rightarrow [0.9, 0.1] C_2(\\mathbf{x}) \\rightarrow [0.8, 0.2] C_3(\\mathbf{x}) \\rightarrow [0.4, 0.6] Using uniform weights, we compute the average probabilities: p(i_0 \\mid \\mathbf{x}) = \\frac{0.9 + 0.8 + 0.4}{3} = 0.7 \\\\\\\\ p(i_1 \\mid \\mathbf{x}) = \\frac{0.1 + 0.2 + 0.6}{3} = 0.3 \\hat{y} = \\arg \\max_i \\big[p(i_0 \\mid \\mathbf{x}), p(i_1 \\mid \\mathbf{x}) \\big] = 0 However, assigning the weights {0.1, 0.1, 0.8} would yield a prediction \\hat{y} = 1 : p(i_0 \\mid \\mathbf{x}) = {0.1 \\times 0.9 + 0.1 \\times 0.8 + 0.8 \\times 0.4} = 0.49 \\\\\\\\ p(i_1 \\mid \\mathbf{x}) = {0.1 \\times 0.1 + 0.2 \\times 0.1 + 0.8 \\times 0.6} = 0.51 \\hat{y} = \\arg \\max_i \\big[p(i_0 \\mid \\mathbf{x}), p(i_1 \\mid \\mathbf{x}) \\big] = 1","title":"Soft Voting"},{"location":"user_guide/classifier/EnsembleVoteClassifier/#references","text":"[1] S. Raschka. Python Machine Learning . Packt Publishing Ltd., 2015.","title":"References"},{"location":"user_guide/classifier/EnsembleVoteClassifier/#example-1-classifying-iris-flowers-using-different-classification-models","text":"from sklearn import datasets iris = datasets.load_iris() X, y = iris.data[:, 1:3], iris.target from sklearn import model_selection from sklearn.linear_model import LogisticRegression from sklearn.naive_bayes import GaussianNB from sklearn.ensemble import RandomForestClassifier import numpy as np clf1 = LogisticRegression(random_state=1) clf2 = RandomForestClassifier(random_state=1) clf3 = GaussianNB() print('5-fold cross validation:\\n') labels = ['Logistic Regression', 'Random Forest', 'Naive Bayes'] for clf, label in zip([clf1, clf2, clf3], labels): scores = model_selection.cross_val_score(clf, X, y, cv=5, scoring='accuracy') print(\"Accuracy: %0.2f (+/- %0.2f) [%s]\" % (scores.mean(), scores.std(), label)) 5-fold cross validation: Accuracy: 0.90 (+/- 0.05) [Logistic Regression] Accuracy: 0.93 (+/- 0.05) [Random Forest] Accuracy: 0.91 (+/- 0.04) [Naive Bayes] from mlxtend.classifier import EnsembleVoteClassifier eclf = EnsembleVoteClassifier(clfs=[clf1, clf2, clf3], weights=[1,1,1]) labels = ['Logistic Regression', 'Random Forest', 'Naive Bayes', 'Ensemble'] for clf, label in zip([clf1, clf2, clf3, eclf], labels): scores = model_selection.cross_val_score(clf, X, y, cv=5, scoring='accuracy') print(\"Accuracy: %0.2f (+/- %0.2f) [%s]\" % (scores.mean(), scores.std(), label)) Accuracy: 0.90 (+/- 0.05) [Logistic Regression] Accuracy: 0.93 (+/- 0.05) [Random Forest] Accuracy: 0.91 (+/- 0.04) [Naive Bayes] Accuracy: 0.95 (+/- 0.05) [Ensemble]","title":"Example 1 - Classifying Iris Flowers Using Different Classification Models"},{"location":"user_guide/classifier/EnsembleVoteClassifier/#plotting-decision-regions","text":"import matplotlib.pyplot as plt from mlxtend.plotting import plot_decision_regions import matplotlib.gridspec as gridspec import itertools gs = gridspec.GridSpec(2, 2) fig = plt.figure(figsize=(10,8)) labels = ['Logistic Regression', 'Random Forest', 'Naive Bayes', 'Ensemble'] for clf, lab, grd in zip([clf1, clf2, clf3, eclf], labels, itertools.product([0, 1], repeat=2)): clf.fit(X, y) ax = plt.subplot(gs[grd[0], grd[1]]) fig = plot_decision_regions(X=X, y=y, clf=clf) plt.title(lab) import matplotlib.pyplot as plt from mlxtend.plotting import plot_decision_regions import matplotlib.gridspec as gridspec import itertools gs = gridspec.GridSpec(2, 2) fig = plt.figure(figsize=(10,8)) labels = ['Logistic Regression', 'Random Forest', 'Naive Bayes', 'Ensemble'] for clf, lab, grd in zip([clf1, clf2, clf3, eclf], labels, itertools.product([0, 1], repeat=2)): clf.fit(X, y) ax = plt.subplot(gs[grd[0], grd[1]]) fig = plot_decision_regions(X=X, y=y, clf=clf) plt.title(lab)","title":"Plotting Decision Regions"},{"location":"user_guide/classifier/EnsembleVoteClassifier/#example-2-grid-search","text":"from sklearn import datasets iris = datasets.load_iris() X, y = iris.data[:, 1:3], iris.target from sklearn.model_selection import GridSearchCV from sklearn.linear_model import LogisticRegression from sklearn.naive_bayes import GaussianNB from sklearn.ensemble import RandomForestClassifier from mlxtend.classifier import EnsembleVoteClassifier clf1 = LogisticRegression(random_state=1) clf2 = RandomForestClassifier(random_state=1) clf3 = GaussianNB() eclf = EnsembleVoteClassifier(clfs=[clf1, clf2, clf3], voting='soft') params = {'logisticregression__C': [1.0, 100.0], 'randomforestclassifier__n_estimators': [20, 200],} grid = GridSearchCV(estimator=eclf, param_grid=params, cv=5) grid.fit(iris.data, iris.target) cv_keys = ('mean_test_score', 'std_test_score', 'params') for r, _ in enumerate(grid.cv_results_['mean_test_score']): print(\"%0.3f +/- %0.2f %r\" % (grid.cv_results_[cv_keys[0]][r], grid.cv_results_[cv_keys[1]][r] / 2.0, grid.cv_results_[cv_keys[2]][r])) 0.953 +/- 0.01 {'logisticregression__C': 1.0, 'randomforestclassifier__n_estimators': 20} 0.960 +/- 0.01 {'logisticregression__C': 1.0, 'randomforestclassifier__n_estimators': 200} 0.960 +/- 0.01 {'logisticregression__C': 100.0, 'randomforestclassifier__n_estimators': 20} 0.953 +/- 0.02 {'logisticregression__C': 100.0, 'randomforestclassifier__n_estimators': 200} Note : If the EnsembleClassifier is initialized with multiple similar estimator objects, the estimator names are modified with consecutive integer indices, for example: clf1 = LogisticRegression(random_state=1) clf2 = RandomForestClassifier(random_state=1) eclf = EnsembleVoteClassifier(clfs=[clf1, clf1, clf2], voting='soft') params = {'logisticregression-1__C': [1.0, 100.0], 'logisticregression-2__C': [1.0, 100.0], 'randomforestclassifier__n_estimators': [20, 200],} grid = GridSearchCV(estimator=eclf, param_grid=params, cv=5) grid = grid.fit(iris.data, iris.target) Note The EnsembleVoteClass also enables grid search over the clfs argument. However, due to the current implementation of GridSearchCV in scikit-learn, it is not possible to search over both, differenct classifiers and classifier parameters at the same time. For instance, while the following parameter dictionary works params = {'randomforestclassifier__n_estimators': [1, 100], 'clfs': [(clf1, clf1, clf1), (clf2, clf3)]} it will use the instance settings of clf1 , clf2 , and clf3 and not overwrite it with the 'n_estimators' settings from 'randomforestclassifier__n_estimators': [1, 100] .","title":"Example 2 - Grid Search"},{"location":"user_guide/classifier/EnsembleVoteClassifier/#example-3-majority-voting-with-classifiers-trained-on-different-feature-subsets","text":"Feature selection algorithms implemented in scikit-learn as well as the SequentialFeatureSelector implement a transform method that passes the reduced feature subset to the next item in a Pipeline . For example, the method def transform(self, X): return X[:, self.k_feature_idx_] returns the best feature columns, k_feature_idx_ , given a dataset X. Thus, we simply need to construct a Pipeline consisting of the feature selector and the classifier in order to select different feature subsets for different algorithms. During fitting , the optimal feature subsets are automatically determined via the GridSearchCV object, and by calling predict , the fitted feature selector in the pipeline only passes these columns along, which resulted in the best performance for the respective classifier. from sklearn import datasets iris = datasets.load_iris() X, y = iris.data[:, :], iris.target from sklearn.model_selection import GridSearchCV from sklearn.linear_model import LogisticRegression from sklearn.naive_bayes import GaussianNB from sklearn.ensemble import RandomForestClassifier from mlxtend.classifier import EnsembleVoteClassifier from sklearn.pipeline import Pipeline from mlxtend.feature_selection import SequentialFeatureSelector clf1 = LogisticRegression(random_state=1) clf2 = RandomForestClassifier(random_state=1) clf3 = GaussianNB() # Creating a feature-selection-classifier pipeline sfs1 = SequentialFeatureSelector(clf1, k_features=4, forward=True, floating=False, scoring='accuracy', verbose=0, cv=0) clf1_pipe = Pipeline([('sfs', sfs1), ('logreg', clf1)]) eclf = EnsembleVoteClassifier(clfs=[clf1_pipe, clf2, clf3], voting='soft') params = {'pipeline__sfs__k_features': [1, 2, 3], 'pipeline__logreg__C': [1.0, 100.0], 'randomforestclassifier__n_estimators': [20, 200]} grid = GridSearchCV(estimator=eclf, param_grid=params, cv=5) grid.fit(iris.data, iris.target) cv_keys = ('mean_test_score', 'std_test_score', 'params') for r, _ in enumerate(grid.cv_results_['mean_test_score']): print(\"%0.3f +/- %0.2f %r\" % (grid.cv_results_[cv_keys[0]][r], grid.cv_results_[cv_keys[1]][r] / 2.0, grid.cv_results_[cv_keys[2]][r])) 0.953 +/- 0.01 {'pipeline__logreg__C': 1.0, 'pipeline__sfs__k_features': 1, 'randomforestclassifier__n_estimators': 20} 0.947 +/- 0.02 {'pipeline__logreg__C': 1.0, 'pipeline__sfs__k_features': 1, 'randomforestclassifier__n_estimators': 200} 0.953 +/- 0.01 {'pipeline__logreg__C': 1.0, 'pipeline__sfs__k_features': 2, 'randomforestclassifier__n_estimators': 20} 0.947 +/- 0.02 {'pipeline__logreg__C': 1.0, 'pipeline__sfs__k_features': 2, 'randomforestclassifier__n_estimators': 200} 0.953 +/- 0.01 {'pipeline__logreg__C': 1.0, 'pipeline__sfs__k_features': 3, 'randomforestclassifier__n_estimators': 20} 0.953 +/- 0.02 {'pipeline__logreg__C': 1.0, 'pipeline__sfs__k_features': 3, 'randomforestclassifier__n_estimators': 200} 0.947 +/- 0.02 {'pipeline__logreg__C': 100.0, 'pipeline__sfs__k_features': 1, 'randomforestclassifier__n_estimators': 20} 0.953 +/- 0.02 {'pipeline__logreg__C': 100.0, 'pipeline__sfs__k_features': 1, 'randomforestclassifier__n_estimators': 200} 0.947 +/- 0.02 {'pipeline__logreg__C': 100.0, 'pipeline__sfs__k_features': 2, 'randomforestclassifier__n_estimators': 20} 0.947 +/- 0.02 {'pipeline__logreg__C': 100.0, 'pipeline__sfs__k_features': 2, 'randomforestclassifier__n_estimators': 200} 0.960 +/- 0.01 {'pipeline__logreg__C': 100.0, 'pipeline__sfs__k_features': 3, 'randomforestclassifier__n_estimators': 20} 0.953 +/- 0.02 {'pipeline__logreg__C': 100.0, 'pipeline__sfs__k_features': 3, 'randomforestclassifier__n_estimators': 200} The best parameters determined via GridSearch are: grid.best_params_ {'pipeline__logreg__C': 100.0, 'pipeline__sfs__k_features': 3, 'randomforestclassifier__n_estimators': 20} Now, we assign these parameters to the ensemble voting classifier, fit the models on the complete training set, and perform a prediction on 3 samples from the Iris dataset. eclf = eclf.set_params(**grid.best_params_) eclf.fit(X, y).predict(X[[1, 51, 149]]) array([0, 1, 2])","title":"Example 3 - Majority voting with classifiers trained on different feature subsets"},{"location":"user_guide/classifier/EnsembleVoteClassifier/#manual-approach","text":"Alternatively, we can select different columns \"manually\" using the ColumnSelector object. In this example, we select only the first (sepal length) and third (petal length) column for the logistic regression classifier ( clf1 ). from mlxtend.feature_selection import ColumnSelector col_sel = ColumnSelector(cols=[0, 2]) clf1_pipe = Pipeline([('sel', col_sel), ('logreg', clf1)]) eclf = EnsembleVoteClassifier(clfs=[clf1_pipe, clf2, clf3], voting='soft') eclf.fit(X, y).predict(X[[1, 51, 149]]) array([0, 1, 2]) Furthermore, we can fit the SequentialFeatureSelector separately, outside the grid search hyperparameter optimization pipeline. Here, we determine the best features first, and then we construct a pipeline using these \"fixed,\" best features as seed for the ColumnSelector : sfs1 = SequentialFeatureSelector(clf1, k_features=2, forward=True, floating=False, scoring='accuracy', verbose=1, cv=0) sfs1.fit(X, y) print('Best features', sfs1.k_feature_idx_) col_sel = ColumnSelector(cols=sfs1.k_feature_idx_) clf1_pipe = Pipeline([('sel', col_sel), ('logreg', clf1)]) [Parallel(n_jobs=1)]: Done 4 out of 4 | elapsed: 0.0s finished Features: 1/2[Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 0.0s finished Features: 2/2 Best features (0, 2) eclf = EnsembleVoteClassifier(clfs=[clf1_pipe, clf2, clf3], voting='soft') eclf.fit(X, y).predict(X[[1, 51, 149]]) array([0, 1, 2])","title":"Manual Approach"},{"location":"user_guide/classifier/EnsembleVoteClassifier/#example-5-using-pre-fitted-classifiers","text":"from sklearn import datasets iris = datasets.load_iris() X, y = iris.data[:, 1:3], iris.target Assume that we previously fitted our classifiers: from sklearn import model_selection from sklearn.linear_model import LogisticRegression from sklearn.naive_bayes import GaussianNB from sklearn.ensemble import RandomForestClassifier import numpy as np clf1 = LogisticRegression(random_state=1) clf2 = RandomForestClassifier(random_state=1) clf3 = GaussianNB() for clf in (clf1, clf2, clf3): clf.fit(X, y) By setting refit=False , the EnsembleVoteClassifier will not re-fit these classifers to save computational time: from mlxtend.classifier import EnsembleVoteClassifier import copy eclf = EnsembleVoteClassifier(clfs=[clf1, clf2, clf3], weights=[1,1,1], refit=False) labels = ['Logistic Regression', 'Random Forest', 'Naive Bayes', 'Ensemble'] eclf.fit(X, y) print('accuracy:', np.mean(y == eclf.predict(X))) accuracy: 0.973333333333 However, please note that refit=False is incompatible to any form of cross-validation that is done in e.g., model_selection.cross_val_score or model_selection.GridSearchCV , etc., since it would require the classifiers to be refit to the training folds. Thus, only use refit=False if you want to make a prediction directly without cross-validation.","title":"Example 5 - Using Pre-fitted Classifiers"},{"location":"user_guide/classifier/EnsembleVoteClassifier/#example-6-ensembles-of-classifiers-that-operate-on-different-feature-subsets","text":"If desired, the different classifiers can be fit to different subsets of features in the training dataset. The following example illustrates how this can be done on a technical level using scikit-learn pipelines and the ColumnSelector : from sklearn.datasets import load_iris from mlxtend.classifier import EnsembleVoteClassifier from mlxtend.feature_selection import ColumnSelector from sklearn.pipeline import make_pipeline from sklearn.linear_model import LogisticRegression iris = load_iris() X = iris.data y = iris.target pipe1 = make_pipeline(ColumnSelector(cols=(0, 2)), LogisticRegression()) pipe2 = make_pipeline(ColumnSelector(cols=(1, 2, 3)), LogisticRegression()) eclf = EnsembleVoteClassifier(clfs=[pipe1, pipe2]) eclf.fit(X, y) EnsembleVoteClassifier(clfs=[Pipeline(memory=None, steps=[('columnselector', ColumnSelector(cols=(0, 2), drop_axis=False)), ('logisticregression', LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True, intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1, penalty='l...='l2', random_state=None, solver='liblinear', tol=0.0001, verbose=0, warm_start=False))])], refit=True, verbose=0, voting='hard', weights=None)","title":"Example 6 - Ensembles of Classifiers that Operate on Different Feature Subsets"},{"location":"user_guide/classifier/EnsembleVoteClassifier/#example-7-a-note-about-scikit-learn-svms-and-soft-voting","text":"This section provides some additional technical insights in how probabilities are used when voting='soft' . Note that scikit-learn estimates the probabilities for SVMs (more info here: http://scikit-learn.org/stable/modules/svm.html#scores-probabilities) in a way that these may not be consistent with the class labels that the SVM predicts. This is an extreme example, but let's say we have a dataset with 3 class labels, 0, 1, and 2. For a given training example, the SVM classifier may predict class 2. However, the class-membership probabilities may look as follows: class 0: 99% class 1: 0.5% class 2: 0.5% A practical example of this scenario is shown below: import numpy as np from mlxtend.classifier import EnsembleVoteClassifier from sklearn.svm import SVC from sklearn.datasets import load_iris iris = load_iris() X, y = iris.data, iris.target clf2 = SVC(probability=True, random_state=4) clf2.fit(X, y) eclf = EnsembleVoteClassifier(clfs=[clf2], voting='soft', refit=False) eclf.fit(X, y) for svm_class, e_class, svm_prob, e_prob, in zip(clf2.predict(X), eclf.predict(X), clf2.predict_proba(X), eclf.predict_proba(X)): if svm_class != e_class: print('============') print('Probas from SVM :', svm_prob) print('Class from SVM :', svm_class) print('Probas from SVM in Ensemble:', e_prob) print('Class from SVM in Ensemble :', e_class) print('============') ============ Probas from SVM : [ 0.01192489 0.47662663 0.51144848] Class from SVM : 1 Probas from SVM in Ensemble: [ 0.01192489 0.47662663 0.51144848] Class from SVM in Ensemble : 2 ============ Based on the probabilities, we would expect the SVM to predict class 2, because it has the highest probability. Since the EnsembleVoteClassifier uses the argmax function internally if voting='soft' , it would indeed predict class 2 in this case even if the ensemble consists of only one SVM model. Note that in practice, this minor technical detail does not need to concern you, but it is useful to keep it in mind in case you are wondering about results from a 1-model SVM ensemble compared to that SVM alone -- this is not a bug.","title":"Example 7 - A Note about Scikit-Learn SVMs and Soft Voting"},{"location":"user_guide/classifier/EnsembleVoteClassifier/#api","text":"EnsembleVoteClassifier(clfs, voting='hard', weights=None, verbose=0, refit=True) Soft Voting/Majority Rule classifier for scikit-learn estimators. Parameters clfs : array-like, shape = [n_classifiers] A list of classifiers. Invoking the fit method on the VotingClassifier will fit clones of those original classifiers that will be stored in the class attribute self.clfs_ if refit=True (default). voting : str, {'hard', 'soft'} (default='hard') If 'hard', uses predicted class labels for majority rule voting. Else if 'soft', predicts the class label based on the argmax of the sums of the predicted probalities, which is recommended for an ensemble of well-calibrated classifiers. weights : array-like, shape = [n_classifiers], optional (default= None ) Sequence of weights ( float or int ) to weight the occurances of predicted class labels ( hard voting) or class probabilities before averaging ( soft voting). Uses uniform weights if None . verbose : int, optional (default=0) Controls the verbosity of the building process. - verbose=0 (default): Prints nothing - verbose=1 : Prints the number & name of the clf being fitted - verbose=2 : Prints info about the parameters of the clf being fitted - verbose>2 : Changes verbose param of the underlying clf to self.verbose - 2 refit : bool (default: True) Refits classifiers in clfs if True; uses references to the clfs , otherwise (assumes that the classifiers were already fit). Note: refit=False is incompatible to mist scikit-learn wrappers! For instance, if any form of cross-validation is performed this would require the re-fitting classifiers to training folds, which would raise a NotFitterError if refit=False. (New in mlxtend v0.6.) Attributes classes_ : array-like, shape = [n_predictions] clf : array-like, shape = [n_predictions] The unmodified input classifiers clf_ : array-like, shape = [n_predictions] Fitted clones of the input classifiers Examples >>> import numpy as np >>> from sklearn.linear_model import LogisticRegression >>> from sklearn.naive_bayes import GaussianNB >>> from sklearn.ensemble import RandomForestClassifier >>> from mlxtend.sklearn import EnsembleVoteClassifier >>> clf1 = LogisticRegression(random_seed=1) >>> clf2 = RandomForestClassifier(random_seed=1) >>> clf3 = GaussianNB() >>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]]) >>> y = np.array([1, 1, 1, 2, 2, 2]) >>> eclf1 = EnsembleVoteClassifier(clfs=[clf1, clf2, clf3], ... voting='hard', verbose=1) >>> eclf1 = eclf1.fit(X, y) >>> print(eclf1.predict(X)) [1 1 1 2 2 2] >>> eclf2 = EnsembleVoteClassifier(clfs=[clf1, clf2, clf3], voting='soft') >>> eclf2 = eclf2.fit(X, y) >>> print(eclf2.predict(X)) [1 1 1 2 2 2] >>> eclf3 = EnsembleVoteClassifier(clfs=[clf1, clf2, clf3], ... voting='soft', weights=[2,1,1]) >>> eclf3 = eclf3.fit(X, y) >>> print(eclf3.predict(X)) [1 1 1 2 2 2] >>> For more usage examples, please see http://rasbt.github.io/mlxtend/user_guide/classifier/EnsembleVoteClassifier/","title":"API"},{"location":"user_guide/classifier/EnsembleVoteClassifier/#methods","text":"fit(X, y, sample_weight=None) Learn weight coefficients from training data for each classifier. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. sample_weight : array-like, shape = [n_samples], optional Sample weights passed as sample_weights to each regressor in the regressors list as well as the meta_regressor. Raises error if some regressor does not support sample_weight in the fit() method. Returns self : object fit_transform(X, y=None, fit_params) Fit to data, then transform it. Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X. Parameters X : numpy array of shape [n_samples, n_features] Training set. y : numpy array of shape [n_samples] Target values. Returns X_new : numpy array of shape [n_samples, n_features_new] Transformed array. get_params(deep=True) Return estimator parameter names for GridSearch support. predict(X) Predict class labels for X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns maj : array-like, shape = [n_samples] Predicted class labels. predict_proba(X) Predict class probabilities for X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns avg : array-like, shape = [n_samples, n_classes] Weighted average probability for each class per sample. score(X, y, sample_weight=None) Returns the mean accuracy on the given test data and labels. In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted. Parameters X : array-like, shape = (n_samples, n_features) Test samples. y : array-like, shape = (n_samples) or (n_samples, n_outputs) True labels for X. sample_weight : array-like, shape = [n_samples], optional Sample weights. Returns score : float Mean accuracy of self.predict(X) wrt. y. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self transform(X) Return class labels or probabilities for X for each estimator. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns If voting='soft'`` : array-like = [n_classifiers, n_samples, n_classes] Class probabilties calculated by each classifier. If voting='hard'`` : array-like = [n_classifiers, n_samples] Class labels predicted by each classifier.","title":"Methods"},{"location":"user_guide/classifier/LogisticRegression/","text":"Logistic Regression A logistic regression class for binary classification tasks. from mlxtend.classifier import LogisticRegression Overview Related to the Perceptron and 'Adaline' , a Logistic Regression model is a linear model for binary classification. However, instead of minimizing a linear cost function such as the sum of squared errors (SSE) in Adaline, we minimize a sigmoid function, i.e., the logistic function: \\phi(z) = \\frac{1}{1 + e^{-z}}, where z is defined as the net input z = w_0x_0 + w_1x_1 + ... + w_mx_m = \\sum_{j=0}^{m} w_j x_j= \\mathbf{w}^T\\mathbf{x}. The net input is in turn based on the logit function logit(p(y=1 \\mid \\mathbf{x})) = z. Here, p(y=1 \\mid \\mathbf{x}) is the conditional probability that a particular sample belongs to class 1 given its features \\mathbf{x} . The logit function takes inputs in the range [0, 1] and transform them to values over the entire real number range. In contrast, the logistic function takes input values over the entire real number range and transforms them to values in the range [0, 1]. In other words, the logistic function is the inverse of the logit function, and it lets us predict the conditional probability that a certain sample belongs to class 1 (or class 0). After model fitting, the conditional probability p(y=1 \\mid \\mathbf{x}) is converted to a binary class label via a threshold function g(\\cdot) : $$y = g({z}) = \\begin{cases} 1 & \\text{if $\\phi(z) \\ge 0.5$}\\\\ 0 & \\text{otherwise.} \\end{cases} $$ or equivalently: $$y = g({z}) = \\begin{cases} 1 & \\text{if z $\\ge$ 0}\\\\ 0 & \\text{otherwise}. \\end{cases} $$ Objective Function -- Log-Likelihood In order to parameterize a logistic regression model, we maximize the likelihood L(\\cdot) (or minimize the logistic cost function). We write the likelihood as L(\\mathbf{w}) = P(\\mathbf{y} \\mid \\mathbf{x};\\mathbf{w}) = \\prod_{i=1}^{n} P\\big(y^{(i)} \\mid x^{(i)}; \\mathbf{w}\\big) = \\prod^{n}_{i=1}\\bigg(\\phi\\big(z^{(i)}\\big)\\bigg)^{y^{(i)}} \\bigg(1-\\phi\\big(z^{(i)}\\big)\\bigg)^{1-y^{(i)}}, under the assumption that the training samples are independent of each other. In practice, it is easier to maximize the (natural) log of this equation, which is called the log-likelihood function: l(\\mathbf{w}) = \\log L(\\mathbf{w}) = \\sum^{n}_{i=1} y^{(i)} \\log \\bigg(\\phi\\big(z^{(i)}\\big)\\bigg) + \\big( 1 - y^{(i)}\\big) \\log \\big(1-\\phi\\big(z^{(i)}\\big)\\big) One advantage of taking the log is to avoid numeric underflow (and challenges with floating point math) for very small likelihoods. Another advantage is that we can obtain the derivative more easily, using the addition trick to rewrite the product of factors as a summation term, which we can then maximize using optimization algorithms such as gradient ascent. Objective Function -- Logistic Cost Function An alternative to maximizing the log-likelihood, we can define a cost function J(\\cdot) to be minimized; we rewrite the log-likelihood as: J(\\mathbf{w}) = \\sum_{i=1}^{m} - y^{(i)} log \\bigg( \\phi\\big(z^{(i)}\\big) \\bigg) - \\big(1 - y^{(i)}\\big) log\\bigg(1-\\phi\\big(z^{(i)}\\big)\\bigg) $$ J\\big(\\phi(z), y; \\mathbf{w}\\big) = \\begin{cases} -log\\big(\\phi(z) \\big) & \\text{if $y = 1$}\\\\ -log\\big(1- \\phi(z) \\big) & \\text{if $y = 0$} \\end{cases} $$ As we can see in the figure above, we penalize wrong predictions with an increasingly larger cost. Gradient Descent (GD) and Stochastic Gradient Descent (SGD) Optimization Gradient Ascent and the log-likelihood To learn the weight coefficient of a logistic regression model via gradient-based optimization, we compute the partial derivative of the log-likelihood function -- w.r.t. the j th weight -- as follows: \\frac{\\partial}{\\partial w_j} l(\\mathbf{w}) = \\bigg(y \\frac{1}{\\phi(z)} - (1-y) \\frac{1}{1-\\phi{(z)}} \\bigg) \\frac{\\partial}{\\partial w_j}\\phi(z) As an intermediate step, we compute the partial derivative of the sigmoid function, which will come in handy later: \\begin{align} &\\frac{\\partial}{\\partial z} \\phi(z) = \\frac{\\partial}{{\\partial z}} \\frac{1}{1+e^{-z}} \\\\\\\\ &= \\frac{1}{(1 + e^{-z})^{2}} e^{-z}\\\\\\\\ &= \\frac{1}{1+e^{-z}} \\bigg(1 - \\frac{1}{1+e^{-z}} \\bigg)\\\\\\\\ &= \\phi(z)\\big(1-\\phi(z)\\big) \\end{align} Now, we re-substitute \\frac{\\partial}{\\partial z} \\phi(z) = \\phi(z) \\big(1 - \\phi(z)\\big) back into in the log-likelihood partial derivative equation and obtain the equation shown below: \\begin{align} & \\bigg(y \\frac{1}{\\phi{(z)}} - (1 - y) \\frac{1}{1 - \\phi(z)} \\bigg) \\frac{\\partial}{\\partial w_j} \\phi(z) \\\\\\\\ &= \\bigg(y \\frac{1}{\\phi{(z)}} - (1 - y) \\frac{1}{1 - \\phi(z)} \\bigg) \\phi(z) \\big(1 - \\phi(z)\\big) \\frac{\\partial}{\\partial w_j}z\\\\\\\\ &= \\big(y(1-\\phi(z)\\big) - (1 - y) \\phi(z)\\big)x_j\\\\\\\\ &=\\big(y - \\phi(z)\\big)x_j \\end{align} Now, in order to find the weights of the model, we take a step proportional to the positive direction of the gradient to maximize the log-likelihood. Futhermore, we add a coefficient, the learning rate \\eta to the weight update: \\begin{align} & w_j := w_j + \\eta \\frac{\\partial}{\\partial w_j} l(\\mathbf{w})\\\\\\\\ & w_j := w_j + \\eta \\sum^{n}_{i=1} \\big( y^{(i)} - \\phi\\big(z^{(i)}\\big)\\big)x_j^{(i)} \\end{align} Note that the gradient (and weight update) is computed from all samples in the training set in gradient ascent/descent in contrast to stochastic gradient ascent/descent. For more information about the differences between gradient descent and stochastic gradient descent, please see the related article Gradient Descent and Stochastic Gradient Descent . The previous equation shows the weight update for a single weight j . In gradient-based optimization, all weight coefficients are updated simultaneously; the weight update can be written more compactly as \\mathbf{w} := \\mathbf{w} + \\Delta\\mathbf{w}, where \\Delta{\\mathbf{w}} = \\eta \\nabla l(\\mathbf{w}) Gradient Descent and the logistic cost function In the previous section, we derived the gradient of the log-likelihood function, which can be optimized via gradient ascent. Similarly, we can obtain the cost gradient of the logistic cost function J(\\cdot) and minimize it via gradient descent in order to learn the logistic regression model. The update rule for a single weight: \\begin{align} & \\Delta{w_j} = -\\eta \\frac{\\partial J}{\\partial w_j} \\\\ & = - \\eta \\sum_{i=1}^{n}\\big(y^{(i)} - \\phi\\big(z^{(i)}\\big) x^{(i)} \\big) \\end{align} The simultaneous weight update: \\mathbf{w} := \\mathbf{w} + \\Delta\\mathbf{w} where \\Delta{\\mathbf{w}} = - \\eta \\nabla J(\\mathbf{w}). Shuffling Random shuffling is implemented as: for one or more epochs randomly shuffle samples in the training set for training sample i compute gradients and perform weight updates Regularization As a way to tackle overfitting, we can add additional bias to the logistic regression model via a regularization terms. Via the L2 regularization term, we reduce the complexity of the model by penalizing large weight coefficients: L2: \\frac{\\lambda}{2}\\lVert \\mathbf{w} \\lVert_2 = \\frac{\\lambda}{2} \\sum_{j=1}^{m} w_j^2 In order to apply regularization, we just need to add the regularization term to the cost function that we defined for logistic regression to shrink the weights: J(\\mathbf{w}) = \\sum_{i=1}^{m} \\Bigg[ - y^{(i)} log \\bigg( \\phi\\big(z^{(i)}\\big) \\bigg) - \\big(1 - y^{(i)}\\big) log\\bigg(1-\\phi\\big(z^{(i)}\\big)\\bigg) \\Bigg] + \\frac{\\lambda}{2} \\sum_{j=1}^{m} w_j^2 The update rule for a single weight: \\begin{align} & \\Delta{w_j} = -\\eta \\bigg( \\frac{\\partial J}{\\partial w_j} + \\lambda w_j\\bigg)\\\\ & = - \\eta \\sum_{i=1}^{n}\\big(y^{(i)} - \\phi\\big(z^{(i)}\\big) x^{(i)} \\big) - \\eta \\lambda w_j \\end{align} The simultaneous weight update: \\mathbf{w} := \\mathbf{w} + \\Delta\\mathbf{w} where \\Delta{\\mathbf{w}} = - \\eta \\big( \\nabla J(\\mathbf{w}) + \\lambda \\mathbf{w}\\big). For more information on regularization, please see Regularization of Generalized Linear Models . References Bishop, Christopher M. Pattern recognition and machine learning . Springer, 2006. pp. 203-213 Example 1 - Gradient Descent from mlxtend.data import iris_data from mlxtend.plotting import plot_decision_regions from mlxtend.classifier import LogisticRegression import matplotlib.pyplot as plt # Loading Data X, y = iris_data() X = X[:, [0, 3]] # sepal length and petal width X = X[0:100] # class 0 and class 1 y = y[0:100] # class 0 and class 1 # standardize X[:,0] = (X[:,0] - X[:,0].mean()) / X[:,0].std() X[:,1] = (X[:,1] - X[:,1].mean()) / X[:,1].std() lr = LogisticRegression(eta=0.1, l2_lambda=0.0, epochs=100, minibatches=1, # for Gradient Descent random_seed=1, print_progress=3) lr.fit(X, y) plot_decision_regions(X, y, clf=lr) plt.title('Logistic Regression - Gradient Descent') plt.show() plt.plot(range(len(lr.cost_)), lr.cost_) plt.xlabel('Iterations') plt.ylabel('Cost') plt.show() Iteration: 100/100 | Cost 0.32 | Elapsed: 0:00:00 | ETA: 0:00:00 Predicting Class Labels y_pred = lr.predict(X) print('Last 3 Class Labels: %s' % y_pred[-3:]) Last 3 Class Labels: [1 1 1] Predicting Class Probabilities y_pred = lr.predict_proba(X) print('Last 3 Class Labels: %s' % y_pred[-3:]) Last 3 Class Labels: [ 0.99997968 0.99339873 0.99992707] Example 2 - Stochastic Gradient Descent from mlxtend.data import iris_data from mlxtend.plotting import plot_decision_regions from mlxtend.classifier import LogisticRegression import matplotlib.pyplot as plt # Loading Data X, y = iris_data() X = X[:, [0, 3]] # sepal length and petal width X = X[0:100] # class 0 and class 1 y = y[0:100] # class 0 and class 1 # standardize X[:,0] = (X[:,0] - X[:,0].mean()) / X[:,0].std() X[:,1] = (X[:,1] - X[:,1].mean()) / X[:,1].std() lr = LogisticRegression(eta=0.5, epochs=30, l2_lambda=0.0, minibatches=len(y), # for SGD learning random_seed=1, print_progress=3) lr.fit(X, y) plot_decision_regions(X, y, clf=lr) plt.title('Logistic Regression - Stochastic Gradient Descent') plt.show() plt.plot(range(len(lr.cost_)), lr.cost_) plt.xlabel('Iterations') plt.ylabel('Cost') plt.show() Iteration: 30/30 | Cost 0.27 | Elapsed: 0:00:00 | ETA: 0:00:00 Example 3 - Stochastic Gradient Descent w. Minibatches Here, we set minibatches to 5, which will result in Minibatch Learning with a batch size of 20 samples (since 100 Iris samples divided by 5 minibatches equals 20). from mlxtend.data import iris_data from mlxtend.plotting import plot_decision_regions from mlxtend.classifier import LogisticRegression import matplotlib.pyplot as plt # Loading Data X, y = iris_data() X = X[:, [0, 3]] # sepal length and petal width X = X[0:100] # class 0 and class 1 y = y[0:100] # class 0 and class 1 # standardize X[:,0] = (X[:,0] - X[:,0].mean()) / X[:,0].std() X[:,1] = (X[:,1] - X[:,1].mean()) / X[:,1].std() lr = LogisticRegression(eta=0.5, epochs=30, l2_lambda=0.0, minibatches=5, # 100/5 = 20 -> minibatch-s random_seed=1, print_progress=3) lr.fit(X, y) plot_decision_regions(X, y, clf=lr) plt.title('Logistic Regression - Stochastic Gradient Descent') plt.show() plt.plot(range(len(lr.cost_)), lr.cost_) plt.xlabel('Iterations') plt.ylabel('Cost') plt.show() Iteration: 30/30 | Cost 0.25 | Elapsed: 0:00:00 | ETA: 0:00:00 API LogisticRegression(eta=0.01, epochs=50, l2_lambda=0.0, minibatches=1, random_seed=None, print_progress=0) Logistic regression classifier. Note that this implementation of Logistic Regression expects binary class labels in {0, 1}. Parameters eta : float (default: 0.01) Learning rate (between 0.0 and 1.0) epochs : int (default: 50) Passes over the training dataset. Prior to each epoch, the dataset is shuffled if minibatches > 1 to prevent cycles in stochastic gradient descent. l2_lambda : float Regularization parameter for L2 regularization. No regularization if l2_lambda=0.0. minibatches : int (default: 1) The number of minibatches for gradient-based optimization. If 1: Gradient Descent learning If len(y): Stochastic Gradient Descent (SGD) online learning If 1 < minibatches < len(y): SGD Minibatch learning random_seed : int (default: None) Set random state for shuffling and initializing the weights. print_progress : int (default: 0) Prints progress in fitting to stderr. 0: No output 1: Epochs elapsed and cost 2: 1 plus time elapsed 3: 2 plus estimated time until completion Attributes w_ : 2d-array, shape={n_features, 1} Model weights after fitting. b_ : 1d-array, shape={1,} Bias unit after fitting. cost_ : list List of floats with cross_entropy cost (sgd or gd) for every epoch. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/classifier/LogisticRegression/ Methods fit(X, y, init_params=True) Learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. init_params : bool (default: True) Re-initializes model parameters prior to fitting. Set False to continue training with weights from a previous model fitting. Returns self : object predict(X) Predict targets from X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns target_values : array-like, shape = [n_samples] Predicted target values. predict_proba(X) Predict class probabilities of X from the net input. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns Class 1 probability : float score(X, y) Compute the prediction accuracy Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values (true class labels). Returns acc : float The prediction accuracy as a float between 0.0 and 1.0 (perfect score).","title":"Logistic Regression"},{"location":"user_guide/classifier/LogisticRegression/#logistic-regression","text":"A logistic regression class for binary classification tasks. from mlxtend.classifier import LogisticRegression","title":"Logistic Regression"},{"location":"user_guide/classifier/LogisticRegression/#overview","text":"Related to the Perceptron and 'Adaline' , a Logistic Regression model is a linear model for binary classification. However, instead of minimizing a linear cost function such as the sum of squared errors (SSE) in Adaline, we minimize a sigmoid function, i.e., the logistic function: \\phi(z) = \\frac{1}{1 + e^{-z}}, where z is defined as the net input z = w_0x_0 + w_1x_1 + ... + w_mx_m = \\sum_{j=0}^{m} w_j x_j= \\mathbf{w}^T\\mathbf{x}. The net input is in turn based on the logit function logit(p(y=1 \\mid \\mathbf{x})) = z. Here, p(y=1 \\mid \\mathbf{x}) is the conditional probability that a particular sample belongs to class 1 given its features \\mathbf{x} . The logit function takes inputs in the range [0, 1] and transform them to values over the entire real number range. In contrast, the logistic function takes input values over the entire real number range and transforms them to values in the range [0, 1]. In other words, the logistic function is the inverse of the logit function, and it lets us predict the conditional probability that a certain sample belongs to class 1 (or class 0). After model fitting, the conditional probability p(y=1 \\mid \\mathbf{x}) is converted to a binary class label via a threshold function g(\\cdot) : $$y = g({z}) = \\begin{cases} 1 & \\text{if $\\phi(z) \\ge 0.5$}\\\\ 0 & \\text{otherwise.} \\end{cases} $$ or equivalently: $$y = g({z}) = \\begin{cases} 1 & \\text{if z $\\ge$ 0}\\\\ 0 & \\text{otherwise}. \\end{cases} $$","title":"Overview"},{"location":"user_guide/classifier/LogisticRegression/#objective-function-log-likelihood","text":"In order to parameterize a logistic regression model, we maximize the likelihood L(\\cdot) (or minimize the logistic cost function). We write the likelihood as L(\\mathbf{w}) = P(\\mathbf{y} \\mid \\mathbf{x};\\mathbf{w}) = \\prod_{i=1}^{n} P\\big(y^{(i)} \\mid x^{(i)}; \\mathbf{w}\\big) = \\prod^{n}_{i=1}\\bigg(\\phi\\big(z^{(i)}\\big)\\bigg)^{y^{(i)}} \\bigg(1-\\phi\\big(z^{(i)}\\big)\\bigg)^{1-y^{(i)}}, under the assumption that the training samples are independent of each other. In practice, it is easier to maximize the (natural) log of this equation, which is called the log-likelihood function: l(\\mathbf{w}) = \\log L(\\mathbf{w}) = \\sum^{n}_{i=1} y^{(i)} \\log \\bigg(\\phi\\big(z^{(i)}\\big)\\bigg) + \\big( 1 - y^{(i)}\\big) \\log \\big(1-\\phi\\big(z^{(i)}\\big)\\big) One advantage of taking the log is to avoid numeric underflow (and challenges with floating point math) for very small likelihoods. Another advantage is that we can obtain the derivative more easily, using the addition trick to rewrite the product of factors as a summation term, which we can then maximize using optimization algorithms such as gradient ascent.","title":"Objective Function -- Log-Likelihood"},{"location":"user_guide/classifier/LogisticRegression/#objective-function-logistic-cost-function","text":"An alternative to maximizing the log-likelihood, we can define a cost function J(\\cdot) to be minimized; we rewrite the log-likelihood as: J(\\mathbf{w}) = \\sum_{i=1}^{m} - y^{(i)} log \\bigg( \\phi\\big(z^{(i)}\\big) \\bigg) - \\big(1 - y^{(i)}\\big) log\\bigg(1-\\phi\\big(z^{(i)}\\big)\\bigg) $$ J\\big(\\phi(z), y; \\mathbf{w}\\big) = \\begin{cases} -log\\big(\\phi(z) \\big) & \\text{if $y = 1$}\\\\ -log\\big(1- \\phi(z) \\big) & \\text{if $y = 0$} \\end{cases} $$ As we can see in the figure above, we penalize wrong predictions with an increasingly larger cost.","title":"Objective Function -- Logistic Cost Function"},{"location":"user_guide/classifier/LogisticRegression/#gradient-descent-gd-and-stochastic-gradient-descent-sgd-optimization","text":"","title":"Gradient Descent (GD) and Stochastic Gradient Descent (SGD) Optimization"},{"location":"user_guide/classifier/LogisticRegression/#gradient-ascent-and-the-log-likelihood","text":"To learn the weight coefficient of a logistic regression model via gradient-based optimization, we compute the partial derivative of the log-likelihood function -- w.r.t. the j th weight -- as follows: \\frac{\\partial}{\\partial w_j} l(\\mathbf{w}) = \\bigg(y \\frac{1}{\\phi(z)} - (1-y) \\frac{1}{1-\\phi{(z)}} \\bigg) \\frac{\\partial}{\\partial w_j}\\phi(z) As an intermediate step, we compute the partial derivative of the sigmoid function, which will come in handy later: \\begin{align} &\\frac{\\partial}{\\partial z} \\phi(z) = \\frac{\\partial}{{\\partial z}} \\frac{1}{1+e^{-z}} \\\\\\\\ &= \\frac{1}{(1 + e^{-z})^{2}} e^{-z}\\\\\\\\ &= \\frac{1}{1+e^{-z}} \\bigg(1 - \\frac{1}{1+e^{-z}} \\bigg)\\\\\\\\ &= \\phi(z)\\big(1-\\phi(z)\\big) \\end{align} Now, we re-substitute \\frac{\\partial}{\\partial z} \\phi(z) = \\phi(z) \\big(1 - \\phi(z)\\big) back into in the log-likelihood partial derivative equation and obtain the equation shown below: \\begin{align} & \\bigg(y \\frac{1}{\\phi{(z)}} - (1 - y) \\frac{1}{1 - \\phi(z)} \\bigg) \\frac{\\partial}{\\partial w_j} \\phi(z) \\\\\\\\ &= \\bigg(y \\frac{1}{\\phi{(z)}} - (1 - y) \\frac{1}{1 - \\phi(z)} \\bigg) \\phi(z) \\big(1 - \\phi(z)\\big) \\frac{\\partial}{\\partial w_j}z\\\\\\\\ &= \\big(y(1-\\phi(z)\\big) - (1 - y) \\phi(z)\\big)x_j\\\\\\\\ &=\\big(y - \\phi(z)\\big)x_j \\end{align} Now, in order to find the weights of the model, we take a step proportional to the positive direction of the gradient to maximize the log-likelihood. Futhermore, we add a coefficient, the learning rate \\eta to the weight update: \\begin{align} & w_j := w_j + \\eta \\frac{\\partial}{\\partial w_j} l(\\mathbf{w})\\\\\\\\ & w_j := w_j + \\eta \\sum^{n}_{i=1} \\big( y^{(i)} - \\phi\\big(z^{(i)}\\big)\\big)x_j^{(i)} \\end{align} Note that the gradient (and weight update) is computed from all samples in the training set in gradient ascent/descent in contrast to stochastic gradient ascent/descent. For more information about the differences between gradient descent and stochastic gradient descent, please see the related article Gradient Descent and Stochastic Gradient Descent . The previous equation shows the weight update for a single weight j . In gradient-based optimization, all weight coefficients are updated simultaneously; the weight update can be written more compactly as \\mathbf{w} := \\mathbf{w} + \\Delta\\mathbf{w}, where \\Delta{\\mathbf{w}} = \\eta \\nabla l(\\mathbf{w})","title":"Gradient Ascent and the log-likelihood"},{"location":"user_guide/classifier/LogisticRegression/#gradient-descent-and-the-logistic-cost-function","text":"In the previous section, we derived the gradient of the log-likelihood function, which can be optimized via gradient ascent. Similarly, we can obtain the cost gradient of the logistic cost function J(\\cdot) and minimize it via gradient descent in order to learn the logistic regression model. The update rule for a single weight: \\begin{align} & \\Delta{w_j} = -\\eta \\frac{\\partial J}{\\partial w_j} \\\\ & = - \\eta \\sum_{i=1}^{n}\\big(y^{(i)} - \\phi\\big(z^{(i)}\\big) x^{(i)} \\big) \\end{align} The simultaneous weight update: \\mathbf{w} := \\mathbf{w} + \\Delta\\mathbf{w} where \\Delta{\\mathbf{w}} = - \\eta \\nabla J(\\mathbf{w}).","title":"Gradient Descent and the logistic cost function"},{"location":"user_guide/classifier/LogisticRegression/#shuffling","text":"Random shuffling is implemented as: for one or more epochs randomly shuffle samples in the training set for training sample i compute gradients and perform weight updates","title":"Shuffling"},{"location":"user_guide/classifier/LogisticRegression/#regularization","text":"As a way to tackle overfitting, we can add additional bias to the logistic regression model via a regularization terms. Via the L2 regularization term, we reduce the complexity of the model by penalizing large weight coefficients: L2: \\frac{\\lambda}{2}\\lVert \\mathbf{w} \\lVert_2 = \\frac{\\lambda}{2} \\sum_{j=1}^{m} w_j^2 In order to apply regularization, we just need to add the regularization term to the cost function that we defined for logistic regression to shrink the weights: J(\\mathbf{w}) = \\sum_{i=1}^{m} \\Bigg[ - y^{(i)} log \\bigg( \\phi\\big(z^{(i)}\\big) \\bigg) - \\big(1 - y^{(i)}\\big) log\\bigg(1-\\phi\\big(z^{(i)}\\big)\\bigg) \\Bigg] + \\frac{\\lambda}{2} \\sum_{j=1}^{m} w_j^2 The update rule for a single weight: \\begin{align} & \\Delta{w_j} = -\\eta \\bigg( \\frac{\\partial J}{\\partial w_j} + \\lambda w_j\\bigg)\\\\ & = - \\eta \\sum_{i=1}^{n}\\big(y^{(i)} - \\phi\\big(z^{(i)}\\big) x^{(i)} \\big) - \\eta \\lambda w_j \\end{align} The simultaneous weight update: \\mathbf{w} := \\mathbf{w} + \\Delta\\mathbf{w} where \\Delta{\\mathbf{w}} = - \\eta \\big( \\nabla J(\\mathbf{w}) + \\lambda \\mathbf{w}\\big). For more information on regularization, please see Regularization of Generalized Linear Models .","title":"Regularization"},{"location":"user_guide/classifier/LogisticRegression/#references","text":"Bishop, Christopher M. Pattern recognition and machine learning . Springer, 2006. pp. 203-213","title":"References"},{"location":"user_guide/classifier/LogisticRegression/#example-1-gradient-descent","text":"from mlxtend.data import iris_data from mlxtend.plotting import plot_decision_regions from mlxtend.classifier import LogisticRegression import matplotlib.pyplot as plt # Loading Data X, y = iris_data() X = X[:, [0, 3]] # sepal length and petal width X = X[0:100] # class 0 and class 1 y = y[0:100] # class 0 and class 1 # standardize X[:,0] = (X[:,0] - X[:,0].mean()) / X[:,0].std() X[:,1] = (X[:,1] - X[:,1].mean()) / X[:,1].std() lr = LogisticRegression(eta=0.1, l2_lambda=0.0, epochs=100, minibatches=1, # for Gradient Descent random_seed=1, print_progress=3) lr.fit(X, y) plot_decision_regions(X, y, clf=lr) plt.title('Logistic Regression - Gradient Descent') plt.show() plt.plot(range(len(lr.cost_)), lr.cost_) plt.xlabel('Iterations') plt.ylabel('Cost') plt.show() Iteration: 100/100 | Cost 0.32 | Elapsed: 0:00:00 | ETA: 0:00:00","title":"Example 1 - Gradient Descent"},{"location":"user_guide/classifier/LogisticRegression/#predicting-class-labels","text":"y_pred = lr.predict(X) print('Last 3 Class Labels: %s' % y_pred[-3:]) Last 3 Class Labels: [1 1 1]","title":"Predicting Class Labels"},{"location":"user_guide/classifier/LogisticRegression/#predicting-class-probabilities","text":"y_pred = lr.predict_proba(X) print('Last 3 Class Labels: %s' % y_pred[-3:]) Last 3 Class Labels: [ 0.99997968 0.99339873 0.99992707]","title":"Predicting Class Probabilities"},{"location":"user_guide/classifier/LogisticRegression/#example-2-stochastic-gradient-descent","text":"from mlxtend.data import iris_data from mlxtend.plotting import plot_decision_regions from mlxtend.classifier import LogisticRegression import matplotlib.pyplot as plt # Loading Data X, y = iris_data() X = X[:, [0, 3]] # sepal length and petal width X = X[0:100] # class 0 and class 1 y = y[0:100] # class 0 and class 1 # standardize X[:,0] = (X[:,0] - X[:,0].mean()) / X[:,0].std() X[:,1] = (X[:,1] - X[:,1].mean()) / X[:,1].std() lr = LogisticRegression(eta=0.5, epochs=30, l2_lambda=0.0, minibatches=len(y), # for SGD learning random_seed=1, print_progress=3) lr.fit(X, y) plot_decision_regions(X, y, clf=lr) plt.title('Logistic Regression - Stochastic Gradient Descent') plt.show() plt.plot(range(len(lr.cost_)), lr.cost_) plt.xlabel('Iterations') plt.ylabel('Cost') plt.show() Iteration: 30/30 | Cost 0.27 | Elapsed: 0:00:00 | ETA: 0:00:00","title":"Example 2 - Stochastic Gradient Descent"},{"location":"user_guide/classifier/LogisticRegression/#example-3-stochastic-gradient-descent-w-minibatches","text":"Here, we set minibatches to 5, which will result in Minibatch Learning with a batch size of 20 samples (since 100 Iris samples divided by 5 minibatches equals 20). from mlxtend.data import iris_data from mlxtend.plotting import plot_decision_regions from mlxtend.classifier import LogisticRegression import matplotlib.pyplot as plt # Loading Data X, y = iris_data() X = X[:, [0, 3]] # sepal length and petal width X = X[0:100] # class 0 and class 1 y = y[0:100] # class 0 and class 1 # standardize X[:,0] = (X[:,0] - X[:,0].mean()) / X[:,0].std() X[:,1] = (X[:,1] - X[:,1].mean()) / X[:,1].std() lr = LogisticRegression(eta=0.5, epochs=30, l2_lambda=0.0, minibatches=5, # 100/5 = 20 -> minibatch-s random_seed=1, print_progress=3) lr.fit(X, y) plot_decision_regions(X, y, clf=lr) plt.title('Logistic Regression - Stochastic Gradient Descent') plt.show() plt.plot(range(len(lr.cost_)), lr.cost_) plt.xlabel('Iterations') plt.ylabel('Cost') plt.show() Iteration: 30/30 | Cost 0.25 | Elapsed: 0:00:00 | ETA: 0:00:00","title":"Example 3 - Stochastic Gradient Descent w. Minibatches"},{"location":"user_guide/classifier/LogisticRegression/#api","text":"LogisticRegression(eta=0.01, epochs=50, l2_lambda=0.0, minibatches=1, random_seed=None, print_progress=0) Logistic regression classifier. Note that this implementation of Logistic Regression expects binary class labels in {0, 1}. Parameters eta : float (default: 0.01) Learning rate (between 0.0 and 1.0) epochs : int (default: 50) Passes over the training dataset. Prior to each epoch, the dataset is shuffled if minibatches > 1 to prevent cycles in stochastic gradient descent. l2_lambda : float Regularization parameter for L2 regularization. No regularization if l2_lambda=0.0. minibatches : int (default: 1) The number of minibatches for gradient-based optimization. If 1: Gradient Descent learning If len(y): Stochastic Gradient Descent (SGD) online learning If 1 < minibatches < len(y): SGD Minibatch learning random_seed : int (default: None) Set random state for shuffling and initializing the weights. print_progress : int (default: 0) Prints progress in fitting to stderr. 0: No output 1: Epochs elapsed and cost 2: 1 plus time elapsed 3: 2 plus estimated time until completion Attributes w_ : 2d-array, shape={n_features, 1} Model weights after fitting. b_ : 1d-array, shape={1,} Bias unit after fitting. cost_ : list List of floats with cross_entropy cost (sgd or gd) for every epoch. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/classifier/LogisticRegression/","title":"API"},{"location":"user_guide/classifier/LogisticRegression/#methods","text":"fit(X, y, init_params=True) Learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. init_params : bool (default: True) Re-initializes model parameters prior to fitting. Set False to continue training with weights from a previous model fitting. Returns self : object predict(X) Predict targets from X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns target_values : array-like, shape = [n_samples] Predicted target values. predict_proba(X) Predict class probabilities of X from the net input. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns Class 1 probability : float score(X, y) Compute the prediction accuracy Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values (true class labels). Returns acc : float The prediction accuracy as a float between 0.0 and 1.0 (perfect score).","title":"Methods"},{"location":"user_guide/classifier/MultiLayerPerceptron/","text":"Neural Network - Multilayer Perceptron Implementation of a multilayer perceptron, a feedforward artificial neural network. from mlxtend.classifier import MultiLayerPerceptron Overview Although the code is fully working and can be used for common classification tasks, this implementation is not geared towards efficiency but clarity \u2013 the original code was written for demonstration purposes. Basic Architecture The neurons x_0 and a_0 represent the bias units ( x_0=1 , a_0=1 ). The i th superscript denotes the i th layer, and the j th subscripts stands for the index of the respective unit. For example, a_{1}^{(2)} refers to the first activation unit after the bias unit (i.e., 2nd activation unit) in the 2nd layer (here: the hidden layer) \\begin{align} \\mathbf{a^{(2)}} &= \\begin{bmatrix} a_{0}^{(2)} \\\\ a_{1}^{(2)} \\\\ \\vdots \\\\ a_{m}^{(2)} \\end{bmatrix}. \\end{align} Each layer (l) in a multi-layer perceptron, a directed graph, is fully connected to the next layer (l+1) . We write the weight coefficient that connects the k th unit in the l th layer to the j th unit in layer l+1 as w^{(l)}_{j, k} . For example, the weight coefficient that connects the units a_0^{(2)} \\rightarrow a_1^{(3)} would be written as w_{1,0}^{(2)} . Activation In the current implementation, the activations of the hidden layer(s) are computed via the logistic (sigmoid) function \\phi(z) = \\frac{1}{1 + e^{-z}}. (For more details on the logistic function, please see classifier.LogisticRegression ; a general overview of different activation function can be found here .) Furthermore, the MLP uses the softmax function in the output layer, For more details on the logistic function, please see classifier.SoftmaxRegression . References D. R. G. H. R. Williams and G. Hinton. Learning representations by back-propagating errors . Nature, pages 323\u2013533, 1986. C. M. Bishop. Neural networks for pattern recognition . Oxford University Press, 1995. T. Hastie, J. Friedman, and R. Tibshirani. The Elements of Statistical Learning , Volume 2. Springer, 2009. Example 1 - Classifying Iris Flowers Load 2 features from Iris (petal length and petal width) for visualization purposes: from mlxtend.data import iris_data X, y = iris_data() X = X[:, [0, 3]] # standardize training data X_std = (X - X.mean(axis=0)) / X.std(axis=0) Train neural network for 3 output flower classes ('Setosa', 'Versicolor', 'Virginica'), regular gradient decent ( minibatches=1 ), 30 hidden units, and no regularization. Gradient Descent Setting the minibatches to 1 will result in gradient descent training; please see Gradient Descent vs. Stochastic Gradient Descent for details. from mlxtend.classifier import MultiLayerPerceptron as MLP nn1 = MLP(hidden_layers=[50], l2=0.00, l1=0.0, epochs=150, eta=0.05, momentum=0.1, decrease_const=0.0, minibatches=1, random_seed=1, print_progress=3) nn1 = nn1.fit(X_std, y) Iteration: 150/150 | Cost 0.06 | Elapsed: 0:00:00 | ETA: 0:00:00 from mlxtend.plotting import plot_decision_regions import matplotlib.pyplot as plt fig = plot_decision_regions(X=X_std, y=y, clf=nn1, legend=2) plt.title('Multi-layer Perceptron w. 1 hidden layer (logistic sigmoid)') plt.show() import matplotlib.pyplot as plt plt.plot(range(len(nn1.cost_)), nn1.cost_) plt.ylabel('Cost') plt.xlabel('Epochs') plt.show() print('Accuracy: %.2f%%' % (100 * nn1.score(X_std, y))) Accuracy: 96.67% Stochastic Gradient Descent Setting minibatches to n_samples will result in stochastic gradient descent training; please see Gradient Descent vs. Stochastic Gradient Descent for details. nn2 = MLP(hidden_layers=[50], l2=0.00, l1=0.0, epochs=5, eta=0.005, momentum=0.1, decrease_const=0.0, minibatches=len(y), random_seed=1, print_progress=3) nn2.fit(X_std, y) plt.plot(range(len(nn2.cost_)), nn2.cost_) plt.ylabel('Cost') plt.xlabel('Epochs') plt.show() Iteration: 5/5 | Cost 0.11 | Elapsed: 00:00:00 | ETA: 00:00:00 Continue the training for 25 epochs... nn2.epochs = 25 nn2 = nn2.fit(X_std, y) Iteration: 25/25 | Cost 0.07 | Elapsed: 0:00:00 | ETA: 0:00:00 plt.plot(range(len(nn2.cost_)), nn2.cost_) plt.ylabel('Cost') plt.xlabel('Epochs') plt.show() Example 2 - Classifying Handwritten Digits from a 10% MNIST Subset Load a 5000-sample subset of the MNIST dataset (please see data.loadlocal_mnist if you want to download and read in the complete MNIST dataset). from mlxtend.data import mnist_data from mlxtend.preprocessing import shuffle_arrays_unison X, y = mnist_data() X, y = shuffle_arrays_unison((X, y), random_seed=1) X_train, y_train = X[:500], y[:500] X_test, y_test = X[500:], y[500:] Visualize a sample from the MNIST dataset to check if it was loaded correctly: import matplotlib.pyplot as plt def plot_digit(X, y, idx): img = X[idx].reshape(28,28) plt.imshow(img, cmap='Greys', interpolation='nearest') plt.title('true label: %d' % y[idx]) plt.show() plot_digit(X, y, 3500) Standardize pixel values: import numpy as np from mlxtend.preprocessing import standardize X_train_std, params = standardize(X_train, columns=range(X_train.shape[1]), return_params=True) X_test_std = standardize(X_test, columns=range(X_test.shape[1]), params=params) Initialize the neural network to recognize the 10 different digits (0-10) using 300 epochs and mini-batch learning. nn1 = MLP(hidden_layers=[150], l2=0.00, l1=0.0, epochs=100, eta=0.005, momentum=0.0, decrease_const=0.0, minibatches=100, random_seed=1, print_progress=3) Learn the features while printing the progress to get an idea about how long it may take. import matplotlib.pyplot as plt nn1.fit(X_train_std, y_train) plt.plot(range(len(nn1.cost_)), nn1.cost_) plt.ylabel('Cost') plt.xlabel('Epochs') plt.show() Iteration: 100/100 | Cost 0.01 | Elapsed: 0:00:17 | ETA: 0:00:00 print('Train Accuracy: %.2f%%' % (100 * nn1.score(X_train_std, y_train))) print('Test Accuracy: %.2f%%' % (100 * nn1.score(X_test_std, y_test))) Train Accuracy: 100.00% Test Accuracy: 84.62% Please note that this neural network has been trained on only 10% of the MNIST data for technical demonstration purposes, hence, the lousy predictive performance. API MultiLayerPerceptron(eta=0.5, epochs=50, hidden_layers=[50], n_classes=None, momentum=0.0, l1=0.0, l2=0.0, dropout=1.0, decrease_const=0.0, minibatches=1, random_seed=None, print_progress=0) Multi-layer perceptron classifier with logistic sigmoid activations Parameters eta : float (default: 0.5) Learning rate (between 0.0 and 1.0) epochs : int (default: 50) Passes over the training dataset. Prior to each epoch, the dataset is shuffled if minibatches > 1 to prevent cycles in stochastic gradient descent. hidden_layers : list (default: [50]) Number of units per hidden layer. By default 50 units in the first hidden layer. At the moment only 1 hidden layer is supported n_classes : int (default: None) A positive integer to declare the number of class labels if not all class labels are present in a partial training set. Gets the number of class labels automatically if None. l1 : float (default: 0.0) L1 regularization strength l2 : float (default: 0.0) L2 regularization strength momentum : float (default: 0.0) Momentum constant. Factor multiplied with the gradient of the previous epoch t-1 to improve learning speed w(t) := w(t) - (grad(t) + momentum * grad(t-1)) decrease_const : float (default: 0.0) Decrease constant. Shrinks the learning rate after each epoch via eta / (1 + epoch*decrease_const) minibatches : int (default: 1) Divide the training data into k minibatches for accelerated stochastic gradient descent learning. Gradient Descent Learning if minibatches = 1 Stochastic Gradient Descent learning if minibatches = len(y) Minibatch learning if minibatches > 1 random_seed : int (default: None) Set random state for shuffling and initializing the weights. print_progress : int (default: 0) Prints progress in fitting to stderr. 0: No output 1: Epochs elapsed and cost 2: 1 plus time elapsed 3: 2 plus estimated time until completion Attributes w_ : 2d-array, shape=[n_features, n_classes] Weights after fitting. b_ : 1D-array, shape=[n_classes] Bias units after fitting. cost_ : list List of floats; the mean categorical cross entropy cost after each epoch. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/classifier/MultiLayerPerceptron/ Methods fit(X, y, init_params=True) Learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. init_params : bool (default: True) Re-initializes model parameters prior to fitting. Set False to continue training with weights from a previous model fitting. Returns self : object predict(X) Predict targets from X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns target_values : array-like, shape = [n_samples] Predicted target values. predict_proba(X) Predict class probabilities of X from the net input. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns Class probabilties : array-like, shape= [n_samples, n_classes] score(X, y) Compute the prediction accuracy Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values (true class labels). Returns acc : float The prediction accuracy as a float between 0.0 and 1.0 (perfect score).","title":"Neural Network - Multilayer Perceptron"},{"location":"user_guide/classifier/MultiLayerPerceptron/#neural-network-multilayer-perceptron","text":"Implementation of a multilayer perceptron, a feedforward artificial neural network. from mlxtend.classifier import MultiLayerPerceptron","title":"Neural Network - Multilayer Perceptron"},{"location":"user_guide/classifier/MultiLayerPerceptron/#overview","text":"Although the code is fully working and can be used for common classification tasks, this implementation is not geared towards efficiency but clarity \u2013 the original code was written for demonstration purposes.","title":"Overview"},{"location":"user_guide/classifier/MultiLayerPerceptron/#basic-architecture","text":"The neurons x_0 and a_0 represent the bias units ( x_0=1 , a_0=1 ). The i th superscript denotes the i th layer, and the j th subscripts stands for the index of the respective unit. For example, a_{1}^{(2)} refers to the first activation unit after the bias unit (i.e., 2nd activation unit) in the 2nd layer (here: the hidden layer) \\begin{align} \\mathbf{a^{(2)}} &= \\begin{bmatrix} a_{0}^{(2)} \\\\ a_{1}^{(2)} \\\\ \\vdots \\\\ a_{m}^{(2)} \\end{bmatrix}. \\end{align} Each layer (l) in a multi-layer perceptron, a directed graph, is fully connected to the next layer (l+1) . We write the weight coefficient that connects the k th unit in the l th layer to the j th unit in layer l+1 as w^{(l)}_{j, k} . For example, the weight coefficient that connects the units a_0^{(2)} \\rightarrow a_1^{(3)} would be written as w_{1,0}^{(2)} .","title":"Basic Architecture"},{"location":"user_guide/classifier/MultiLayerPerceptron/#activation","text":"In the current implementation, the activations of the hidden layer(s) are computed via the logistic (sigmoid) function \\phi(z) = \\frac{1}{1 + e^{-z}}. (For more details on the logistic function, please see classifier.LogisticRegression ; a general overview of different activation function can be found here .) Furthermore, the MLP uses the softmax function in the output layer, For more details on the logistic function, please see classifier.SoftmaxRegression .","title":"Activation"},{"location":"user_guide/classifier/MultiLayerPerceptron/#references","text":"D. R. G. H. R. Williams and G. Hinton. Learning representations by back-propagating errors . Nature, pages 323\u2013533, 1986. C. M. Bishop. Neural networks for pattern recognition . Oxford University Press, 1995. T. Hastie, J. Friedman, and R. Tibshirani. The Elements of Statistical Learning , Volume 2. Springer, 2009.","title":"References"},{"location":"user_guide/classifier/MultiLayerPerceptron/#example-1-classifying-iris-flowers","text":"Load 2 features from Iris (petal length and petal width) for visualization purposes: from mlxtend.data import iris_data X, y = iris_data() X = X[:, [0, 3]] # standardize training data X_std = (X - X.mean(axis=0)) / X.std(axis=0) Train neural network for 3 output flower classes ('Setosa', 'Versicolor', 'Virginica'), regular gradient decent ( minibatches=1 ), 30 hidden units, and no regularization.","title":"Example 1 - Classifying Iris Flowers"},{"location":"user_guide/classifier/MultiLayerPerceptron/#gradient-descent","text":"Setting the minibatches to 1 will result in gradient descent training; please see Gradient Descent vs. Stochastic Gradient Descent for details. from mlxtend.classifier import MultiLayerPerceptron as MLP nn1 = MLP(hidden_layers=[50], l2=0.00, l1=0.0, epochs=150, eta=0.05, momentum=0.1, decrease_const=0.0, minibatches=1, random_seed=1, print_progress=3) nn1 = nn1.fit(X_std, y) Iteration: 150/150 | Cost 0.06 | Elapsed: 0:00:00 | ETA: 0:00:00 from mlxtend.plotting import plot_decision_regions import matplotlib.pyplot as plt fig = plot_decision_regions(X=X_std, y=y, clf=nn1, legend=2) plt.title('Multi-layer Perceptron w. 1 hidden layer (logistic sigmoid)') plt.show() import matplotlib.pyplot as plt plt.plot(range(len(nn1.cost_)), nn1.cost_) plt.ylabel('Cost') plt.xlabel('Epochs') plt.show() print('Accuracy: %.2f%%' % (100 * nn1.score(X_std, y))) Accuracy: 96.67%","title":"Gradient Descent"},{"location":"user_guide/classifier/MultiLayerPerceptron/#stochastic-gradient-descent","text":"Setting minibatches to n_samples will result in stochastic gradient descent training; please see Gradient Descent vs. Stochastic Gradient Descent for details. nn2 = MLP(hidden_layers=[50], l2=0.00, l1=0.0, epochs=5, eta=0.005, momentum=0.1, decrease_const=0.0, minibatches=len(y), random_seed=1, print_progress=3) nn2.fit(X_std, y) plt.plot(range(len(nn2.cost_)), nn2.cost_) plt.ylabel('Cost') plt.xlabel('Epochs') plt.show() Iteration: 5/5 | Cost 0.11 | Elapsed: 00:00:00 | ETA: 00:00:00 Continue the training for 25 epochs... nn2.epochs = 25 nn2 = nn2.fit(X_std, y) Iteration: 25/25 | Cost 0.07 | Elapsed: 0:00:00 | ETA: 0:00:00 plt.plot(range(len(nn2.cost_)), nn2.cost_) plt.ylabel('Cost') plt.xlabel('Epochs') plt.show()","title":"Stochastic Gradient Descent"},{"location":"user_guide/classifier/MultiLayerPerceptron/#example-2-classifying-handwritten-digits-from-a-10-mnist-subset","text":"Load a 5000-sample subset of the MNIST dataset (please see data.loadlocal_mnist if you want to download and read in the complete MNIST dataset). from mlxtend.data import mnist_data from mlxtend.preprocessing import shuffle_arrays_unison X, y = mnist_data() X, y = shuffle_arrays_unison((X, y), random_seed=1) X_train, y_train = X[:500], y[:500] X_test, y_test = X[500:], y[500:] Visualize a sample from the MNIST dataset to check if it was loaded correctly: import matplotlib.pyplot as plt def plot_digit(X, y, idx): img = X[idx].reshape(28,28) plt.imshow(img, cmap='Greys', interpolation='nearest') plt.title('true label: %d' % y[idx]) plt.show() plot_digit(X, y, 3500) Standardize pixel values: import numpy as np from mlxtend.preprocessing import standardize X_train_std, params = standardize(X_train, columns=range(X_train.shape[1]), return_params=True) X_test_std = standardize(X_test, columns=range(X_test.shape[1]), params=params) Initialize the neural network to recognize the 10 different digits (0-10) using 300 epochs and mini-batch learning. nn1 = MLP(hidden_layers=[150], l2=0.00, l1=0.0, epochs=100, eta=0.005, momentum=0.0, decrease_const=0.0, minibatches=100, random_seed=1, print_progress=3) Learn the features while printing the progress to get an idea about how long it may take. import matplotlib.pyplot as plt nn1.fit(X_train_std, y_train) plt.plot(range(len(nn1.cost_)), nn1.cost_) plt.ylabel('Cost') plt.xlabel('Epochs') plt.show() Iteration: 100/100 | Cost 0.01 | Elapsed: 0:00:17 | ETA: 0:00:00 print('Train Accuracy: %.2f%%' % (100 * nn1.score(X_train_std, y_train))) print('Test Accuracy: %.2f%%' % (100 * nn1.score(X_test_std, y_test))) Train Accuracy: 100.00% Test Accuracy: 84.62% Please note that this neural network has been trained on only 10% of the MNIST data for technical demonstration purposes, hence, the lousy predictive performance.","title":"Example 2 - Classifying Handwritten Digits from a 10% MNIST Subset"},{"location":"user_guide/classifier/MultiLayerPerceptron/#api","text":"MultiLayerPerceptron(eta=0.5, epochs=50, hidden_layers=[50], n_classes=None, momentum=0.0, l1=0.0, l2=0.0, dropout=1.0, decrease_const=0.0, minibatches=1, random_seed=None, print_progress=0) Multi-layer perceptron classifier with logistic sigmoid activations Parameters eta : float (default: 0.5) Learning rate (between 0.0 and 1.0) epochs : int (default: 50) Passes over the training dataset. Prior to each epoch, the dataset is shuffled if minibatches > 1 to prevent cycles in stochastic gradient descent. hidden_layers : list (default: [50]) Number of units per hidden layer. By default 50 units in the first hidden layer. At the moment only 1 hidden layer is supported n_classes : int (default: None) A positive integer to declare the number of class labels if not all class labels are present in a partial training set. Gets the number of class labels automatically if None. l1 : float (default: 0.0) L1 regularization strength l2 : float (default: 0.0) L2 regularization strength momentum : float (default: 0.0) Momentum constant. Factor multiplied with the gradient of the previous epoch t-1 to improve learning speed w(t) := w(t) - (grad(t) + momentum * grad(t-1)) decrease_const : float (default: 0.0) Decrease constant. Shrinks the learning rate after each epoch via eta / (1 + epoch*decrease_const) minibatches : int (default: 1) Divide the training data into k minibatches for accelerated stochastic gradient descent learning. Gradient Descent Learning if minibatches = 1 Stochastic Gradient Descent learning if minibatches = len(y) Minibatch learning if minibatches > 1 random_seed : int (default: None) Set random state for shuffling and initializing the weights. print_progress : int (default: 0) Prints progress in fitting to stderr. 0: No output 1: Epochs elapsed and cost 2: 1 plus time elapsed 3: 2 plus estimated time until completion Attributes w_ : 2d-array, shape=[n_features, n_classes] Weights after fitting. b_ : 1D-array, shape=[n_classes] Bias units after fitting. cost_ : list List of floats; the mean categorical cross entropy cost after each epoch. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/classifier/MultiLayerPerceptron/","title":"API"},{"location":"user_guide/classifier/MultiLayerPerceptron/#methods","text":"fit(X, y, init_params=True) Learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. init_params : bool (default: True) Re-initializes model parameters prior to fitting. Set False to continue training with weights from a previous model fitting. Returns self : object predict(X) Predict targets from X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns target_values : array-like, shape = [n_samples] Predicted target values. predict_proba(X) Predict class probabilities of X from the net input. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns Class probabilties : array-like, shape= [n_samples, n_classes] score(X, y) Compute the prediction accuracy Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values (true class labels). Returns acc : float The prediction accuracy as a float between 0.0 and 1.0 (perfect score).","title":"Methods"},{"location":"user_guide/classifier/Perceptron/","text":"Perceptron Implementation of a Perceptron learning algorithm for classification. from mlxtend.classifier import Perceptron Overview The idea behind this \"thresholded\" perceptron was to mimic how a single neuron in the brain works: It either \"fires\" or not. A perceptron receives multiple input signals, and if the sum of the input signals exceed a certain threshold it either returns a signal or remains \"silent\" otherwise. What made this a \"machine learning\" algorithm was Frank Rosenblatt's idea of the perceptron learning rule: The perceptron algorithm is about learning the weights for the input signals in order to draw linear decision boundary that allows us to discriminate between the two linearly separable classes +1 and -1. Basic Notation Before we dive deeper into the algorithm(s) for learning the weights of the perceptron classifier, let us take a brief look at the basic notation. In the following sections, we will label the positive and negative class in our binary classification setting as \"1\" and \"-1\", respectively. Next, we define an activation function g(\\mathbf{z}) that takes a linear combination of the input values \\mathbf{x} and weights \\mathbf{w} as input ( \\mathbf{z} = w_1x_{1} + \\dots + w_mx_{m} ), and if g(\\mathbf{z}) is greater than a defined threshold \\theta we predict 1 and -1 otherwise; in this case, this activation function g is a simple \"unit step function,\" which is sometimes also called \"Heaviside step function.\" $$ g(z) = \\begin{cases} 1 & \\text{if $z \\ge \\theta$}\\\\ -1 & \\text{otherwise}. \\end{cases} $$ where z = w_1x_{1} + \\dots + w_mx_{m} = \\sum_{j=1}^{m} x_{j}w_{j} \\\\ = \\mathbf{w}^T\\mathbf{x} \\mathbf{w} is the feature vector, and \\mathbf{x} is an m -dimensional sample from the training dataset: \\mathbf{w} = \\begin{bmatrix} w_{1} \\\\ \\vdots \\\\ w_{m} \\end{bmatrix} \\quad \\mathbf{x} = \\begin{bmatrix} x_{1} \\\\ \\vdots \\\\ x_{m} \\end{bmatrix} In order to simplify the notation, we bring \\theta to the left side of the equation and define w_0 = -\\theta \\text{ and } x_0=1 so that $$ g({z}) = \\begin{cases} 1 & \\text{if $z \\ge 0$}\\\\ -1 & \\text{otherwise}. \\end{cases} $$ and z = w_0x_{0} + w_1x_{1} + \\dots + w_mx_{m} = \\sum_{j=0}^{m} x_{j}w_{j} \\\\ = \\mathbf{w}^T\\mathbf{x}. Perceptron Rule Rosenblatt's initial perceptron rule is fairly simple and can be summarized by the following steps: Initialize the weights to 0 or small random numbers. For each training sample \\mathbf{x^{(i)}} : Calculate the output value. Update the weights. The output value is the class label predicted by the unit step function that we defined earlier (output =g(\\mathbf{z}) ) and the weight update can be written more formally as w_j := w_j + \\Delta w_j . The value for updating the weights at each increment is calculated by the learning rule \\Delta w_j = \\eta \\; (\\text{target}^{(i)} - \\text{output}^{(i)})\\;x^{(i)}_{j} where \\eta is the learning rate (a constant between 0.0 and 1.0), \"target\" is the true class label, and the \"output\" is the predicted class label. aIt is important to note that all weights in the weight vector are being updated simultaneously. Concretely, for a 2-dimensional dataset, we would write the update as: \\Delta w_0 = \\eta(\\text{target}^{(i)} - \\text{output}^{(i)}) \\Delta w_1 = \\eta(\\text{target}^{(i)} - \\text{output}^{(i)})\\;x^{(i)}_{1} \\Delta w_2 = \\eta(\\text{target}^{(i)} - \\text{output}^{(i)})\\;x^{(i)}_{2} Before we implement the perceptron rule in Python, let us make a simple thought experiment to illustrate how beautifully simple this learning rule really is. In the two scenarios where the perceptron predicts the class label correctly, the weights remain unchanged: \\Delta w_j = \\eta(-1^{(i)} - -1^{(i)})\\;x^{(i)}_{j} = 0 \\Delta w_j = \\eta(1^{(i)} - 1^{(i)})\\;x^{(i)}_{j} = 0 However, in case of a wrong prediction, the weights are being \"pushed\" towards the direction of the positive or negative target class, respectively: \\Delta w_j = \\eta(1^{(i)} - -1^{(i)})\\;x^{(i)}_{j} = \\eta(2)\\;x^{(i)}_{j} \\Delta w_j = \\eta(-1^{(i)} - 1^{(i)})\\;x^{(i)}_{j} = \\eta(-2)\\;x^{(i)}_{j} It is important to note that the convergence of the perceptron is only guaranteed if the two classes are linearly separable. If the two classes can't be separated by a linear decision boundary, we can set a maximum number of passes over the training dataset (\"epochs\") and/or a threshold for the number of tolerated misclassifications. References F. Rosenblatt. The perceptron, a perceiving and recognizing automaton Project Para. Cornell Aeronautical Laboratory, 1957. Example 1 - Classification of Iris Flowers from mlxtend.data import iris_data from mlxtend.plotting import plot_decision_regions from mlxtend.classifier import Perceptron import matplotlib.pyplot as plt # Loading Data X, y = iris_data() X = X[:, [0, 3]] # sepal length and petal width X = X[0:100] # class 0 and class 1 y = y[0:100] # class 0 and class 1 # standardize X[:,0] = (X[:,0] - X[:,0].mean()) / X[:,0].std() X[:,1] = (X[:,1] - X[:,1].mean()) / X[:,1].std() # Rosenblatt Perceptron ppn = Perceptron(epochs=5, eta=0.05, random_seed=0, print_progress=3) ppn.fit(X, y) plot_decision_regions(X, y, clf=ppn) plt.title('Perceptron - Rosenblatt Perceptron Rule') plt.show() print('Bias & Weights: %s' % ppn.w_) plt.plot(range(len(ppn.cost_)), ppn.cost_) plt.xlabel('Iterations') plt.ylabel('Missclassifications') plt.show() Iteration: 5/5 | Elapsed: 00:00:00 | ETA: 00:00:00 Bias & Weights: [[-0.04500809] [ 0.11048855]] API Perceptron(eta=0.1, epochs=50, random_seed=None, print_progress=0) Perceptron classifier. Note that this implementation of the Perceptron expects binary class labels in {0, 1}. Parameters eta : float (default: 0.1) Learning rate (between 0.0 and 1.0) epochs : int (default: 50) Number of passes over the training dataset. Prior to each epoch, the dataset is shuffled to prevent cycles. random_seed : int Random state for initializing random weights and shuffling. print_progress : int (default: 0) Prints progress in fitting to stderr. 0: No output 1: Epochs elapsed and cost 2: 1 plus time elapsed 3: 2 plus estimated time until completion Attributes w_ : 2d-array, shape={n_features, 1} Model weights after fitting. b_ : 1d-array, shape={1,} Bias unit after fitting. cost_ : list Number of misclassifications in every epoch. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/classifier/Perceptron/ Methods fit(X, y, init_params=True) Learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. init_params : bool (default: True) Re-initializes model parameters prior to fitting. Set False to continue training with weights from a previous model fitting. Returns self : object predict(X) Predict targets from X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns target_values : array-like, shape = [n_samples] Predicted target values. score(X, y) Compute the prediction accuracy Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values (true class labels). Returns acc : float The prediction accuracy as a float between 0.0 and 1.0 (perfect score).","title":"Perceptron"},{"location":"user_guide/classifier/Perceptron/#perceptron","text":"Implementation of a Perceptron learning algorithm for classification. from mlxtend.classifier import Perceptron","title":"Perceptron"},{"location":"user_guide/classifier/Perceptron/#overview","text":"The idea behind this \"thresholded\" perceptron was to mimic how a single neuron in the brain works: It either \"fires\" or not. A perceptron receives multiple input signals, and if the sum of the input signals exceed a certain threshold it either returns a signal or remains \"silent\" otherwise. What made this a \"machine learning\" algorithm was Frank Rosenblatt's idea of the perceptron learning rule: The perceptron algorithm is about learning the weights for the input signals in order to draw linear decision boundary that allows us to discriminate between the two linearly separable classes +1 and -1.","title":"Overview"},{"location":"user_guide/classifier/Perceptron/#basic-notation","text":"Before we dive deeper into the algorithm(s) for learning the weights of the perceptron classifier, let us take a brief look at the basic notation. In the following sections, we will label the positive and negative class in our binary classification setting as \"1\" and \"-1\", respectively. Next, we define an activation function g(\\mathbf{z}) that takes a linear combination of the input values \\mathbf{x} and weights \\mathbf{w} as input ( \\mathbf{z} = w_1x_{1} + \\dots + w_mx_{m} ), and if g(\\mathbf{z}) is greater than a defined threshold \\theta we predict 1 and -1 otherwise; in this case, this activation function g is a simple \"unit step function,\" which is sometimes also called \"Heaviside step function.\" $$ g(z) = \\begin{cases} 1 & \\text{if $z \\ge \\theta$}\\\\ -1 & \\text{otherwise}. \\end{cases} $$ where z = w_1x_{1} + \\dots + w_mx_{m} = \\sum_{j=1}^{m} x_{j}w_{j} \\\\ = \\mathbf{w}^T\\mathbf{x} \\mathbf{w} is the feature vector, and \\mathbf{x} is an m -dimensional sample from the training dataset: \\mathbf{w} = \\begin{bmatrix} w_{1} \\\\ \\vdots \\\\ w_{m} \\end{bmatrix} \\quad \\mathbf{x} = \\begin{bmatrix} x_{1} \\\\ \\vdots \\\\ x_{m} \\end{bmatrix} In order to simplify the notation, we bring \\theta to the left side of the equation and define w_0 = -\\theta \\text{ and } x_0=1 so that $$ g({z}) = \\begin{cases} 1 & \\text{if $z \\ge 0$}\\\\ -1 & \\text{otherwise}. \\end{cases} $$ and z = w_0x_{0} + w_1x_{1} + \\dots + w_mx_{m} = \\sum_{j=0}^{m} x_{j}w_{j} \\\\ = \\mathbf{w}^T\\mathbf{x}.","title":"Basic Notation"},{"location":"user_guide/classifier/Perceptron/#perceptron-rule","text":"Rosenblatt's initial perceptron rule is fairly simple and can be summarized by the following steps: Initialize the weights to 0 or small random numbers. For each training sample \\mathbf{x^{(i)}} : Calculate the output value. Update the weights. The output value is the class label predicted by the unit step function that we defined earlier (output =g(\\mathbf{z}) ) and the weight update can be written more formally as w_j := w_j + \\Delta w_j . The value for updating the weights at each increment is calculated by the learning rule \\Delta w_j = \\eta \\; (\\text{target}^{(i)} - \\text{output}^{(i)})\\;x^{(i)}_{j} where \\eta is the learning rate (a constant between 0.0 and 1.0), \"target\" is the true class label, and the \"output\" is the predicted class label. aIt is important to note that all weights in the weight vector are being updated simultaneously. Concretely, for a 2-dimensional dataset, we would write the update as: \\Delta w_0 = \\eta(\\text{target}^{(i)} - \\text{output}^{(i)}) \\Delta w_1 = \\eta(\\text{target}^{(i)} - \\text{output}^{(i)})\\;x^{(i)}_{1} \\Delta w_2 = \\eta(\\text{target}^{(i)} - \\text{output}^{(i)})\\;x^{(i)}_{2} Before we implement the perceptron rule in Python, let us make a simple thought experiment to illustrate how beautifully simple this learning rule really is. In the two scenarios where the perceptron predicts the class label correctly, the weights remain unchanged: \\Delta w_j = \\eta(-1^{(i)} - -1^{(i)})\\;x^{(i)}_{j} = 0 \\Delta w_j = \\eta(1^{(i)} - 1^{(i)})\\;x^{(i)}_{j} = 0 However, in case of a wrong prediction, the weights are being \"pushed\" towards the direction of the positive or negative target class, respectively: \\Delta w_j = \\eta(1^{(i)} - -1^{(i)})\\;x^{(i)}_{j} = \\eta(2)\\;x^{(i)}_{j} \\Delta w_j = \\eta(-1^{(i)} - 1^{(i)})\\;x^{(i)}_{j} = \\eta(-2)\\;x^{(i)}_{j} It is important to note that the convergence of the perceptron is only guaranteed if the two classes are linearly separable. If the two classes can't be separated by a linear decision boundary, we can set a maximum number of passes over the training dataset (\"epochs\") and/or a threshold for the number of tolerated misclassifications.","title":"Perceptron Rule"},{"location":"user_guide/classifier/Perceptron/#references","text":"F. Rosenblatt. The perceptron, a perceiving and recognizing automaton Project Para. Cornell Aeronautical Laboratory, 1957.","title":"References"},{"location":"user_guide/classifier/Perceptron/#example-1-classification-of-iris-flowers","text":"from mlxtend.data import iris_data from mlxtend.plotting import plot_decision_regions from mlxtend.classifier import Perceptron import matplotlib.pyplot as plt # Loading Data X, y = iris_data() X = X[:, [0, 3]] # sepal length and petal width X = X[0:100] # class 0 and class 1 y = y[0:100] # class 0 and class 1 # standardize X[:,0] = (X[:,0] - X[:,0].mean()) / X[:,0].std() X[:,1] = (X[:,1] - X[:,1].mean()) / X[:,1].std() # Rosenblatt Perceptron ppn = Perceptron(epochs=5, eta=0.05, random_seed=0, print_progress=3) ppn.fit(X, y) plot_decision_regions(X, y, clf=ppn) plt.title('Perceptron - Rosenblatt Perceptron Rule') plt.show() print('Bias & Weights: %s' % ppn.w_) plt.plot(range(len(ppn.cost_)), ppn.cost_) plt.xlabel('Iterations') plt.ylabel('Missclassifications') plt.show() Iteration: 5/5 | Elapsed: 00:00:00 | ETA: 00:00:00 Bias & Weights: [[-0.04500809] [ 0.11048855]]","title":"Example 1 - Classification of Iris Flowers"},{"location":"user_guide/classifier/Perceptron/#api","text":"Perceptron(eta=0.1, epochs=50, random_seed=None, print_progress=0) Perceptron classifier. Note that this implementation of the Perceptron expects binary class labels in {0, 1}. Parameters eta : float (default: 0.1) Learning rate (between 0.0 and 1.0) epochs : int (default: 50) Number of passes over the training dataset. Prior to each epoch, the dataset is shuffled to prevent cycles. random_seed : int Random state for initializing random weights and shuffling. print_progress : int (default: 0) Prints progress in fitting to stderr. 0: No output 1: Epochs elapsed and cost 2: 1 plus time elapsed 3: 2 plus estimated time until completion Attributes w_ : 2d-array, shape={n_features, 1} Model weights after fitting. b_ : 1d-array, shape={1,} Bias unit after fitting. cost_ : list Number of misclassifications in every epoch. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/classifier/Perceptron/","title":"API"},{"location":"user_guide/classifier/Perceptron/#methods","text":"fit(X, y, init_params=True) Learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. init_params : bool (default: True) Re-initializes model parameters prior to fitting. Set False to continue training with weights from a previous model fitting. Returns self : object predict(X) Predict targets from X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns target_values : array-like, shape = [n_samples] Predicted target values. score(X, y) Compute the prediction accuracy Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values (true class labels). Returns acc : float The prediction accuracy as a float between 0.0 and 1.0 (perfect score).","title":"Methods"},{"location":"user_guide/classifier/SoftmaxRegression/","text":"Softmax Regression A logistic regression class for multi-class classification tasks. from mlxtend.classifier import SoftmaxRegression Overview Softmax Regression (synonyms: Multinomial Logistic , Maximum Entropy Classifier , or just Multi-class Logistic Regression ) is a generalization of logistic regression that we can use for multi-class classification (under the assumption that the classes are mutually exclusive). In contrast, we use the (standard) Logistic Regression model in binary classification tasks. Below is a schematic of a Logistic Regression model, for more details, please see the LogisticRegression manual . In Softmax Regression (SMR), we replace the sigmoid logistic function by the so-called softmax function \\phi_{softmax}(\\cdot) . P(y=j \\mid z^{(i)}) = \\phi_{softmax}(z^{(i)}) = \\frac{e^{z^{(i)}}}{\\sum_{j=0}^{k} e^{z_{k}^{(i)}}}, where we define the net input z as z = w_1x_1 + ... + w_mx_m + b= \\sum_{l=1}^{m} w_l x_l + b= \\mathbf{w}^T\\mathbf{x} + b. ( w is the weight vector, \\mathbf{x} is the feature vector of 1 training sample, and b is the bias unit.) Now, this softmax function computes the probability that this training sample \\mathbf{x}^{(i)} belongs to class j given the weight and net input z^{(i)} . So, we compute the probability p(y = j \\mid \\mathbf{x^{(i)}; w}_j) for each class label in j = 1, \\ldots, k. . Note the normalization term in the denominator which causes these class probabilities to sum up to one. To illustrate the concept of softmax, let us walk through a concrete example. Let's assume we have a training set consisting of 4 samples from 3 different classes (0, 1, and 2) x_0 \\rightarrow \\text{class }0 x_1 \\rightarrow \\text{class }1 x_2 \\rightarrow \\text{class }2 x_3 \\rightarrow \\text{class }2 import numpy as np y = np.array([0, 1, 2, 2]) First, we want to encode the class labels into a format that we can more easily work with; we apply one-hot encoding: y_enc = (np.arange(np.max(y) + 1) == y[:, None]).astype(float) print('one-hot encoding:\\n', y_enc) one-hot encoding: [[ 1. 0. 0.] [ 0. 1. 0.] [ 0. 0. 1.] [ 0. 0. 1.]] A sample that belongs to class 0 (the first row) has a 1 in the first cell, a sample that belongs to class 2 has a 1 in the second cell of its row, and so forth. Next, let us define the feature matrix of our 4 training samples. Here, we assume that our dataset consists of 2 features; thus, we create a 4x2 dimensional matrix of our samples and features. Similarly, we create a 2x3 dimensional weight matrix (one row per feature and one column for each class). X = np.array([[0.1, 0.5], [1.1, 2.3], [-1.1, -2.3], [-1.5, -2.5]]) W = np.array([[0.1, 0.2, 0.3], [0.1, 0.2, 0.3]]) bias = np.array([0.01, 0.1, 0.1]) print('Inputs X:\\n', X) print('\\nWeights W:\\n', W) print('\\nbias:\\n', bias) Inputs X: [[ 0.1 0.5] [ 1.1 2.3] [-1.1 -2.3] [-1.5 -2.5]] Weights W: [[ 0.1 0.2 0.3] [ 0.1 0.2 0.3]] bias: [ 0.01 0.1 0.1 ] To compute the net input, we multiply the 4x2 matrix feature matrix X with the 2x3 (n_features x n_classes) weight matrix W , which yields a 4x3 output matrix (n_samples x n_classes) to which we then add the bias unit: \\mathbf{Z} = \\mathbf{X}\\mathbf{W} + \\mathbf{b}. X = np.array([[0.1, 0.5], [1.1, 2.3], [-1.1, -2.3], [-1.5, -2.5]]) W = np.array([[0.1, 0.2, 0.3], [0.1, 0.2, 0.3]]) bias = np.array([0.01, 0.1, 0.1]) print('Inputs X:\\n', X) print('\\nWeights W:\\n', W) print('\\nbias:\\n', bias) Inputs X: [[ 0.1 0.5] [ 1.1 2.3] [-1.1 -2.3] [-1.5 -2.5]] Weights W: [[ 0.1 0.2 0.3] [ 0.1 0.2 0.3]] bias: [ 0.01 0.1 0.1 ] def net_input(X, W, b): return (X.dot(W) + b) net_in = net_input(X, W, bias) print('net input:\\n', net_in) net input: [[ 0.07 0.22 0.28] [ 0.35 0.78 1.12] [-0.33 -0.58 -0.92] [-0.39 -0.7 -1.1 ]] Now, it's time to compute the softmax activation that we discussed earlier: P(y=j \\mid z^{(i)}) = \\phi_{softmax}(z^{(i)}) = \\frac{e^{z^{(i)}}}{\\sum_{j=0}^{k} e^{z_{k}^{(i)}}}. def softmax(z): return (np.exp(z.T) / np.sum(np.exp(z), axis=1)).T smax = softmax(net_in) print('softmax:\\n', smax) softmax: [[ 0.29450637 0.34216758 0.36332605] [ 0.21290077 0.32728332 0.45981591] [ 0.42860913 0.33380113 0.23758974] [ 0.44941979 0.32962558 0.22095463]] As we can see, the values for each sample (row) nicely sum up to 1 now. E.g., we can say that the first sample [ 0.29450637 0.34216758 0.36332605] has a 29.45% probability to belong to class 0. Now, in order to turn these probabilities back into class labels, we could simply take the argmax-index position of each row: [[ 0.29450637 0.34216758 0.36332605 ] -> 2 [ 0.21290077 0.32728332 0.45981591 ] -> 2 [ 0.42860913 0.33380113 0.23758974] -> 0 [ 0.44941979 0.32962558 0.22095463]] -> 0 def to_classlabel(z): return z.argmax(axis=1) print('predicted class labels: ', to_classlabel(smax)) predicted class labels: [2 2 0 0] As we can see, our predictions are terribly wrong, since the correct class labels are [0, 1, 2, 2] . Now, in order to train our logistic model (e.g., via an optimization algorithm such as gradient descent), we need to define a cost function J(\\cdot) that we want to minimize: J(\\mathbf{W}; \\mathbf{b}) = \\frac{1}{n} \\sum_{i=1}^{n} H(T_i, O_i), which is the average of all cross-entropies over our n training samples. The cross-entropy function is defined as H(T_i, O_i) = -\\sum_m T_i \\cdot log(O_i). Here the T stands for \"target\" (i.e., the true class labels) and the O stands for output -- the computed probability via softmax; not the predicted class label. def cross_entropy(output, y_target): return - np.sum(np.log(output) * (y_target), axis=1) xent = cross_entropy(smax, y_enc) print('Cross Entropy:', xent) Cross Entropy: [ 1.22245465 1.11692907 1.43720989 1.50979788] def cost(output, y_target): return np.mean(cross_entropy(output, y_target)) J_cost = cost(smax, y_enc) print('Cost: ', J_cost) Cost: 1.32159787159 In order to learn our softmax model -- determining the weight coefficients -- via gradient descent, we then need to compute the derivative \\nabla \\mathbf{w}_j \\, J(\\mathbf{W}; \\mathbf{b}). I don't want to walk through the tedious details here, but this cost derivative turns out to be simply: \\nabla \\mathbf{w}_j \\, J(\\mathbf{W}; \\mathbf{b}) = \\frac{1}{n} \\sum^{n}_{i=0} \\big[\\mathbf{x}^{(i)}\\ \\big(O_i - T_i \\big) \\big] We can then use the cost derivate to update the weights in opposite direction of the cost gradient with learning rate \\eta : \\mathbf{w}_j := \\mathbf{w}_j - \\eta \\nabla \\mathbf{w}_j \\, J(\\mathbf{W}; \\mathbf{b}) for each class j \\in \\{0, 1, ..., k\\} (note that \\mathbf{w}_j is the weight vector for the class y=j ), and we update the bias units \\mathbf{b}_j := \\mathbf{b}_j - \\eta \\bigg[ \\frac{1}{n} \\sum^{n}_{i=0} \\big(O_i - T_i \\big) \\bigg]. As a penalty against complexity, an approach to reduce the variance of our model and decrease the degree of overfitting by adding additional bias, we can further add a regularization term such as the L2 term with the regularization parameter \\lambda : L2: \\frac{\\lambda}{2} ||\\mathbf{w}||_{2}^{2} , where ||\\mathbf{w}||_{2}^{2} = \\sum^{m}_{l=0} \\sum^{k}_{j=0} w_{i, j} so that our cost function becomes J(\\mathbf{W}; \\mathbf{b}) = \\frac{1}{n} \\sum_{i=1}^{n} H(T_i, O_i) + \\frac{\\lambda}{2} ||\\mathbf{w}||_{2}^{2} and we define the \"regularized\" weight update as \\mathbf{w}_j := \\mathbf{w}_j - \\eta \\big[\\nabla \\mathbf{w}_j \\, J(\\mathbf{W}) + \\lambda \\mathbf{w}_j \\big]. (Please note that we don't regularize the bias term.) Example 1 - Gradient Descent from mlxtend.data import iris_data from mlxtend.plotting import plot_decision_regions from mlxtend.classifier import SoftmaxRegression import matplotlib.pyplot as plt # Loading Data X, y = iris_data() X = X[:, [0, 3]] # sepal length and petal width # standardize X[:,0] = (X[:,0] - X[:,0].mean()) / X[:,0].std() X[:,1] = (X[:,1] - X[:,1].mean()) / X[:,1].std() lr = SoftmaxRegression(eta=0.01, epochs=500, minibatches=1, random_seed=1, print_progress=3) lr.fit(X, y) plot_decision_regions(X, y, clf=lr) plt.title('Softmax Regression - Gradient Descent') plt.show() plt.plot(range(len(lr.cost_)), lr.cost_) plt.xlabel('Iterations') plt.ylabel('Cost') plt.show() Iteration: 500/500 | Cost 0.06 | Elapsed: 0:00:00 | ETA: 0:00:00 Predicting Class Labels y_pred = lr.predict(X) print('Last 3 Class Labels: %s' % y_pred[-3:]) Last 3 Class Labels: [2 2 2] Predicting Class Probabilities y_pred = lr.predict_proba(X) print('Last 3 Class Labels:\\n %s' % y_pred[-3:]) Last 3 Class Labels: [[ 9.18728149e-09 1.68894679e-02 9.83110523e-01] [ 2.97052325e-11 7.26356627e-04 9.99273643e-01] [ 1.57464093e-06 1.57779528e-01 8.42218897e-01]] Example 2 - Stochastic Gradient Descent from mlxtend.data import iris_data from mlxtend.plotting import plot_decision_regions from mlxtend.classifier import SoftmaxRegression import matplotlib.pyplot as plt # Loading Data X, y = iris_data() X = X[:, [0, 3]] # sepal length and petal width # standardize X[:,0] = (X[:,0] - X[:,0].mean()) / X[:,0].std() X[:,1] = (X[:,1] - X[:,1].mean()) / X[:,1].std() lr = SoftmaxRegression(eta=0.01, epochs=300, minibatches=len(y), random_seed=1) lr.fit(X, y) plot_decision_regions(X, y, clf=lr) plt.title('Softmax Regression - Stochastic Gradient Descent') plt.show() plt.plot(range(len(lr.cost_)), lr.cost_) plt.xlabel('Iterations') plt.ylabel('Cost') plt.show() API SoftmaxRegression(eta=0.01, epochs=50, l2=0.0, minibatches=1, n_classes=None, random_seed=None, print_progress=0) Softmax regression classifier. Parameters eta : float (default: 0.01) Learning rate (between 0.0 and 1.0) epochs : int (default: 50) Passes over the training dataset. Prior to each epoch, the dataset is shuffled if minibatches > 1 to prevent cycles in stochastic gradient descent. l2 : float Regularization parameter for L2 regularization. No regularization if l2=0.0. minibatches : int (default: 1) The number of minibatches for gradient-based optimization. If 1: Gradient Descent learning If len(y): Stochastic Gradient Descent (SGD) online learning If 1 < minibatches < len(y): SGD Minibatch learning n_classes : int (default: None) A positive integer to declare the number of class labels if not all class labels are present in a partial training set. Gets the number of class labels automatically if None. random_seed : int (default: None) Set random state for shuffling and initializing the weights. print_progress : int (default: 0) Prints progress in fitting to stderr. 0: No output 1: Epochs elapsed and cost 2: 1 plus time elapsed 3: 2 plus estimated time until completion Attributes w_ : 2d-array, shape={n_features, 1} Model weights after fitting. b_ : 1d-array, shape={1,} Bias unit after fitting. cost_ : list List of floats, the average cross_entropy for each epoch. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/classifier/SoftmaxRegression/ Methods fit(X, y, init_params=True) Learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. init_params : bool (default: True) Re-initializes model parameters prior to fitting. Set False to continue training with weights from a previous model fitting. Returns self : object predict(X) Predict targets from X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns target_values : array-like, shape = [n_samples] Predicted target values. predict_proba(X) Predict class probabilities of X from the net input. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns Class probabilties : array-like, shape= [n_samples, n_classes] score(X, y) Compute the prediction accuracy Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values (true class labels). Returns acc : float The prediction accuracy as a float between 0.0 and 1.0 (perfect score).","title":"Softmax Regression"},{"location":"user_guide/classifier/SoftmaxRegression/#softmax-regression","text":"A logistic regression class for multi-class classification tasks. from mlxtend.classifier import SoftmaxRegression","title":"Softmax Regression"},{"location":"user_guide/classifier/SoftmaxRegression/#overview","text":"Softmax Regression (synonyms: Multinomial Logistic , Maximum Entropy Classifier , or just Multi-class Logistic Regression ) is a generalization of logistic regression that we can use for multi-class classification (under the assumption that the classes are mutually exclusive). In contrast, we use the (standard) Logistic Regression model in binary classification tasks. Below is a schematic of a Logistic Regression model, for more details, please see the LogisticRegression manual . In Softmax Regression (SMR), we replace the sigmoid logistic function by the so-called softmax function \\phi_{softmax}(\\cdot) . P(y=j \\mid z^{(i)}) = \\phi_{softmax}(z^{(i)}) = \\frac{e^{z^{(i)}}}{\\sum_{j=0}^{k} e^{z_{k}^{(i)}}}, where we define the net input z as z = w_1x_1 + ... + w_mx_m + b= \\sum_{l=1}^{m} w_l x_l + b= \\mathbf{w}^T\\mathbf{x} + b. ( w is the weight vector, \\mathbf{x} is the feature vector of 1 training sample, and b is the bias unit.) Now, this softmax function computes the probability that this training sample \\mathbf{x}^{(i)} belongs to class j given the weight and net input z^{(i)} . So, we compute the probability p(y = j \\mid \\mathbf{x^{(i)}; w}_j) for each class label in j = 1, \\ldots, k. . Note the normalization term in the denominator which causes these class probabilities to sum up to one. To illustrate the concept of softmax, let us walk through a concrete example. Let's assume we have a training set consisting of 4 samples from 3 different classes (0, 1, and 2) x_0 \\rightarrow \\text{class }0 x_1 \\rightarrow \\text{class }1 x_2 \\rightarrow \\text{class }2 x_3 \\rightarrow \\text{class }2 import numpy as np y = np.array([0, 1, 2, 2]) First, we want to encode the class labels into a format that we can more easily work with; we apply one-hot encoding: y_enc = (np.arange(np.max(y) + 1) == y[:, None]).astype(float) print('one-hot encoding:\\n', y_enc) one-hot encoding: [[ 1. 0. 0.] [ 0. 1. 0.] [ 0. 0. 1.] [ 0. 0. 1.]] A sample that belongs to class 0 (the first row) has a 1 in the first cell, a sample that belongs to class 2 has a 1 in the second cell of its row, and so forth. Next, let us define the feature matrix of our 4 training samples. Here, we assume that our dataset consists of 2 features; thus, we create a 4x2 dimensional matrix of our samples and features. Similarly, we create a 2x3 dimensional weight matrix (one row per feature and one column for each class). X = np.array([[0.1, 0.5], [1.1, 2.3], [-1.1, -2.3], [-1.5, -2.5]]) W = np.array([[0.1, 0.2, 0.3], [0.1, 0.2, 0.3]]) bias = np.array([0.01, 0.1, 0.1]) print('Inputs X:\\n', X) print('\\nWeights W:\\n', W) print('\\nbias:\\n', bias) Inputs X: [[ 0.1 0.5] [ 1.1 2.3] [-1.1 -2.3] [-1.5 -2.5]] Weights W: [[ 0.1 0.2 0.3] [ 0.1 0.2 0.3]] bias: [ 0.01 0.1 0.1 ] To compute the net input, we multiply the 4x2 matrix feature matrix X with the 2x3 (n_features x n_classes) weight matrix W , which yields a 4x3 output matrix (n_samples x n_classes) to which we then add the bias unit: \\mathbf{Z} = \\mathbf{X}\\mathbf{W} + \\mathbf{b}. X = np.array([[0.1, 0.5], [1.1, 2.3], [-1.1, -2.3], [-1.5, -2.5]]) W = np.array([[0.1, 0.2, 0.3], [0.1, 0.2, 0.3]]) bias = np.array([0.01, 0.1, 0.1]) print('Inputs X:\\n', X) print('\\nWeights W:\\n', W) print('\\nbias:\\n', bias) Inputs X: [[ 0.1 0.5] [ 1.1 2.3] [-1.1 -2.3] [-1.5 -2.5]] Weights W: [[ 0.1 0.2 0.3] [ 0.1 0.2 0.3]] bias: [ 0.01 0.1 0.1 ] def net_input(X, W, b): return (X.dot(W) + b) net_in = net_input(X, W, bias) print('net input:\\n', net_in) net input: [[ 0.07 0.22 0.28] [ 0.35 0.78 1.12] [-0.33 -0.58 -0.92] [-0.39 -0.7 -1.1 ]] Now, it's time to compute the softmax activation that we discussed earlier: P(y=j \\mid z^{(i)}) = \\phi_{softmax}(z^{(i)}) = \\frac{e^{z^{(i)}}}{\\sum_{j=0}^{k} e^{z_{k}^{(i)}}}. def softmax(z): return (np.exp(z.T) / np.sum(np.exp(z), axis=1)).T smax = softmax(net_in) print('softmax:\\n', smax) softmax: [[ 0.29450637 0.34216758 0.36332605] [ 0.21290077 0.32728332 0.45981591] [ 0.42860913 0.33380113 0.23758974] [ 0.44941979 0.32962558 0.22095463]] As we can see, the values for each sample (row) nicely sum up to 1 now. E.g., we can say that the first sample [ 0.29450637 0.34216758 0.36332605] has a 29.45% probability to belong to class 0. Now, in order to turn these probabilities back into class labels, we could simply take the argmax-index position of each row: [[ 0.29450637 0.34216758 0.36332605 ] -> 2 [ 0.21290077 0.32728332 0.45981591 ] -> 2 [ 0.42860913 0.33380113 0.23758974] -> 0 [ 0.44941979 0.32962558 0.22095463]] -> 0 def to_classlabel(z): return z.argmax(axis=1) print('predicted class labels: ', to_classlabel(smax)) predicted class labels: [2 2 0 0] As we can see, our predictions are terribly wrong, since the correct class labels are [0, 1, 2, 2] . Now, in order to train our logistic model (e.g., via an optimization algorithm such as gradient descent), we need to define a cost function J(\\cdot) that we want to minimize: J(\\mathbf{W}; \\mathbf{b}) = \\frac{1}{n} \\sum_{i=1}^{n} H(T_i, O_i), which is the average of all cross-entropies over our n training samples. The cross-entropy function is defined as H(T_i, O_i) = -\\sum_m T_i \\cdot log(O_i). Here the T stands for \"target\" (i.e., the true class labels) and the O stands for output -- the computed probability via softmax; not the predicted class label. def cross_entropy(output, y_target): return - np.sum(np.log(output) * (y_target), axis=1) xent = cross_entropy(smax, y_enc) print('Cross Entropy:', xent) Cross Entropy: [ 1.22245465 1.11692907 1.43720989 1.50979788] def cost(output, y_target): return np.mean(cross_entropy(output, y_target)) J_cost = cost(smax, y_enc) print('Cost: ', J_cost) Cost: 1.32159787159 In order to learn our softmax model -- determining the weight coefficients -- via gradient descent, we then need to compute the derivative \\nabla \\mathbf{w}_j \\, J(\\mathbf{W}; \\mathbf{b}). I don't want to walk through the tedious details here, but this cost derivative turns out to be simply: \\nabla \\mathbf{w}_j \\, J(\\mathbf{W}; \\mathbf{b}) = \\frac{1}{n} \\sum^{n}_{i=0} \\big[\\mathbf{x}^{(i)}\\ \\big(O_i - T_i \\big) \\big] We can then use the cost derivate to update the weights in opposite direction of the cost gradient with learning rate \\eta : \\mathbf{w}_j := \\mathbf{w}_j - \\eta \\nabla \\mathbf{w}_j \\, J(\\mathbf{W}; \\mathbf{b}) for each class j \\in \\{0, 1, ..., k\\} (note that \\mathbf{w}_j is the weight vector for the class y=j ), and we update the bias units \\mathbf{b}_j := \\mathbf{b}_j - \\eta \\bigg[ \\frac{1}{n} \\sum^{n}_{i=0} \\big(O_i - T_i \\big) \\bigg]. As a penalty against complexity, an approach to reduce the variance of our model and decrease the degree of overfitting by adding additional bias, we can further add a regularization term such as the L2 term with the regularization parameter \\lambda : L2: \\frac{\\lambda}{2} ||\\mathbf{w}||_{2}^{2} , where ||\\mathbf{w}||_{2}^{2} = \\sum^{m}_{l=0} \\sum^{k}_{j=0} w_{i, j} so that our cost function becomes J(\\mathbf{W}; \\mathbf{b}) = \\frac{1}{n} \\sum_{i=1}^{n} H(T_i, O_i) + \\frac{\\lambda}{2} ||\\mathbf{w}||_{2}^{2} and we define the \"regularized\" weight update as \\mathbf{w}_j := \\mathbf{w}_j - \\eta \\big[\\nabla \\mathbf{w}_j \\, J(\\mathbf{W}) + \\lambda \\mathbf{w}_j \\big]. (Please note that we don't regularize the bias term.)","title":"Overview"},{"location":"user_guide/classifier/SoftmaxRegression/#example-1-gradient-descent","text":"from mlxtend.data import iris_data from mlxtend.plotting import plot_decision_regions from mlxtend.classifier import SoftmaxRegression import matplotlib.pyplot as plt # Loading Data X, y = iris_data() X = X[:, [0, 3]] # sepal length and petal width # standardize X[:,0] = (X[:,0] - X[:,0].mean()) / X[:,0].std() X[:,1] = (X[:,1] - X[:,1].mean()) / X[:,1].std() lr = SoftmaxRegression(eta=0.01, epochs=500, minibatches=1, random_seed=1, print_progress=3) lr.fit(X, y) plot_decision_regions(X, y, clf=lr) plt.title('Softmax Regression - Gradient Descent') plt.show() plt.plot(range(len(lr.cost_)), lr.cost_) plt.xlabel('Iterations') plt.ylabel('Cost') plt.show() Iteration: 500/500 | Cost 0.06 | Elapsed: 0:00:00 | ETA: 0:00:00","title":"Example 1 - Gradient Descent"},{"location":"user_guide/classifier/SoftmaxRegression/#predicting-class-labels","text":"y_pred = lr.predict(X) print('Last 3 Class Labels: %s' % y_pred[-3:]) Last 3 Class Labels: [2 2 2]","title":"Predicting Class Labels"},{"location":"user_guide/classifier/SoftmaxRegression/#predicting-class-probabilities","text":"y_pred = lr.predict_proba(X) print('Last 3 Class Labels:\\n %s' % y_pred[-3:]) Last 3 Class Labels: [[ 9.18728149e-09 1.68894679e-02 9.83110523e-01] [ 2.97052325e-11 7.26356627e-04 9.99273643e-01] [ 1.57464093e-06 1.57779528e-01 8.42218897e-01]]","title":"Predicting Class Probabilities"},{"location":"user_guide/classifier/SoftmaxRegression/#example-2-stochastic-gradient-descent","text":"from mlxtend.data import iris_data from mlxtend.plotting import plot_decision_regions from mlxtend.classifier import SoftmaxRegression import matplotlib.pyplot as plt # Loading Data X, y = iris_data() X = X[:, [0, 3]] # sepal length and petal width # standardize X[:,0] = (X[:,0] - X[:,0].mean()) / X[:,0].std() X[:,1] = (X[:,1] - X[:,1].mean()) / X[:,1].std() lr = SoftmaxRegression(eta=0.01, epochs=300, minibatches=len(y), random_seed=1) lr.fit(X, y) plot_decision_regions(X, y, clf=lr) plt.title('Softmax Regression - Stochastic Gradient Descent') plt.show() plt.plot(range(len(lr.cost_)), lr.cost_) plt.xlabel('Iterations') plt.ylabel('Cost') plt.show()","title":"Example 2 - Stochastic Gradient Descent"},{"location":"user_guide/classifier/SoftmaxRegression/#api","text":"SoftmaxRegression(eta=0.01, epochs=50, l2=0.0, minibatches=1, n_classes=None, random_seed=None, print_progress=0) Softmax regression classifier. Parameters eta : float (default: 0.01) Learning rate (between 0.0 and 1.0) epochs : int (default: 50) Passes over the training dataset. Prior to each epoch, the dataset is shuffled if minibatches > 1 to prevent cycles in stochastic gradient descent. l2 : float Regularization parameter for L2 regularization. No regularization if l2=0.0. minibatches : int (default: 1) The number of minibatches for gradient-based optimization. If 1: Gradient Descent learning If len(y): Stochastic Gradient Descent (SGD) online learning If 1 < minibatches < len(y): SGD Minibatch learning n_classes : int (default: None) A positive integer to declare the number of class labels if not all class labels are present in a partial training set. Gets the number of class labels automatically if None. random_seed : int (default: None) Set random state for shuffling and initializing the weights. print_progress : int (default: 0) Prints progress in fitting to stderr. 0: No output 1: Epochs elapsed and cost 2: 1 plus time elapsed 3: 2 plus estimated time until completion Attributes w_ : 2d-array, shape={n_features, 1} Model weights after fitting. b_ : 1d-array, shape={1,} Bias unit after fitting. cost_ : list List of floats, the average cross_entropy for each epoch. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/classifier/SoftmaxRegression/","title":"API"},{"location":"user_guide/classifier/SoftmaxRegression/#methods","text":"fit(X, y, init_params=True) Learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. init_params : bool (default: True) Re-initializes model parameters prior to fitting. Set False to continue training with weights from a previous model fitting. Returns self : object predict(X) Predict targets from X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns target_values : array-like, shape = [n_samples] Predicted target values. predict_proba(X) Predict class probabilities of X from the net input. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns Class probabilties : array-like, shape= [n_samples, n_classes] score(X, y) Compute the prediction accuracy Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values (true class labels). Returns acc : float The prediction accuracy as a float between 0.0 and 1.0 (perfect score).","title":"Methods"},{"location":"user_guide/classifier/StackingCVClassifier/","text":"StackingCVClassifier An ensemble-learning meta-classifier for stacking using cross-validation to prepare the inputs for the level-2 classifier to prevent overfitting. from mlxtend.classifier import StackingCVClassifier Overview Stacking is an ensemble learning technique to combine multiple classification models via a meta-classifier. The StackingCVClassifier extends the standard stacking algorithm (implemented as StackingClassifier ) using cross-validation to prepare the input data for the level-2 classifier. In the standard stacking procedure, the first-level classifiers are fit to the same training set that is used prepare the inputs for the second-level classifier, which may lead to overfitting. The StackingCVClassifier , however, uses the concept of cross-validation: the dataset is split into k folds, and in k successive rounds, k-1 folds are used to fit the first level classifier; in each round, the first-level classifiers are then applied to the remaining 1 subset that was not used for model fitting in each iteration. The resulting predictions are then stacked and provided -- as input data -- to the second-level classifier. After the training of the StackingCVClassifier , the first-level classifiers are fit to the entire dataset as illustrated in the figure below. More formally, the Stacking Cross-Validation algorithm can be summarized as follows (source: [1]): References [1] Tang, J., S. Alelyani, and H. Liu. \" Data Classification: Algorithms and Applications. \" Data Mining and Knowledge Discovery Series, CRC Press (2015): pp. 498-500. [2] Wolpert, David H. \" Stacked generalization. \" Neural networks 5.2 (1992): 241-259. Example 1 - Simple Stacking CV Classification from sklearn import datasets iris = datasets.load_iris() X, y = iris.data[:, 1:3], iris.target from sklearn import model_selection from sklearn.linear_model import LogisticRegression from sklearn.neighbors import KNeighborsClassifier from sklearn.naive_bayes import GaussianNB from sklearn.ensemble import RandomForestClassifier from mlxtend.classifier import StackingCVClassifier import numpy as np RANDOM_SEED = 42 clf1 = KNeighborsClassifier(n_neighbors=1) clf2 = RandomForestClassifier(random_state=RANDOM_SEED) clf3 = GaussianNB() lr = LogisticRegression() # The StackingCVClassifier uses scikit-learn's check_cv # internally, which doesn't support a random seed. Thus # NumPy's random seed need to be specified explicitely for # deterministic behavior np.random.seed(RANDOM_SEED) sclf = StackingCVClassifier(classifiers=[clf1, clf2, clf3], meta_classifier=lr) print('3-fold cross validation:\\n') for clf, label in zip([clf1, clf2, clf3, sclf], ['KNN', 'Random Forest', 'Naive Bayes', 'StackingClassifier']): scores = model_selection.cross_val_score(clf, X, y, cv=3, scoring='accuracy') print(\"Accuracy: %0.2f (+/- %0.2f) [%s]\" % (scores.mean(), scores.std(), label)) 3-fold cross validation: Accuracy: 0.91 (+/- 0.01) [KNN] Accuracy: 0.90 (+/- 0.03) [Random Forest] Accuracy: 0.92 (+/- 0.03) [Naive Bayes] Accuracy: 0.93 (+/- 0.02) [StackingClassifier] import matplotlib.pyplot as plt from mlxtend.plotting import plot_decision_regions import matplotlib.gridspec as gridspec import itertools gs = gridspec.GridSpec(2, 2) fig = plt.figure(figsize=(10,8)) for clf, lab, grd in zip([clf1, clf2, clf3, sclf], ['KNN', 'Random Forest', 'Naive Bayes', 'StackingCVClassifier'], itertools.product([0, 1], repeat=2)): clf.fit(X, y) ax = plt.subplot(gs[grd[0], grd[1]]) fig = plot_decision_regions(X=X, y=y, clf=clf) plt.title(lab) plt.show() Example 2 - Using Probabilities as Meta-Features Alternatively, the class-probabilities of the first-level classifiers can be used to train the meta-classifier (2nd-level classifier) by setting use_probas=True . For example, in a 3-class setting with 2 level-1 classifiers, these classifiers may make the following \"probability\" predictions for 1 training sample: classifier 1: [0.2, 0.5, 0.3] classifier 2: [0.3, 0.4, 0.4] This results in k features, where k = [n_classes * n_classifiers], by stacking these level-1 probabilities: [0.2, 0.5, 0.3, 0.3, 0.4, 0.4] clf1 = KNeighborsClassifier(n_neighbors=1) clf2 = RandomForestClassifier(random_state=1) clf3 = GaussianNB() lr = LogisticRegression() # The StackingCVClassifier uses scikit-learn's check_cv # internally, which doesn't support a random seed. Thus # NumPy's random seed need to be specified explicitely for # deterministic behavior np.random.seed(RANDOM_SEED) sclf = StackingCVClassifier(classifiers=[clf1, clf2, clf3], use_probas=True, meta_classifier=lr) print('3-fold cross validation:\\n') for clf, label in zip([clf1, clf2, clf3, sclf], ['KNN', 'Random Forest', 'Naive Bayes', 'StackingClassifier']): scores = model_selection.cross_val_score(clf, X, y, cv=3, scoring='accuracy') print(\"Accuracy: %0.2f (+/- %0.2f) [%s]\" % (scores.mean(), scores.std(), label)) 3-fold cross validation: Accuracy: 0.91 (+/- 0.01) [KNN] Accuracy: 0.91 (+/- 0.06) [Random Forest] Accuracy: 0.92 (+/- 0.03) [Naive Bayes] Accuracy: 0.95 (+/- 0.04) [StackingClassifier] Example 3 - Stacked CV Classification and GridSearch To set up a parameter grid for scikit-learn's GridSearch , we simply provide the estimator's names in the parameter grid -- in the special case of the meta-regressor, we append the 'meta-' prefix. from sklearn.linear_model import LogisticRegression from sklearn.neighbors import KNeighborsClassifier from sklearn.naive_bayes import GaussianNB from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import GridSearchCV from mlxtend.classifier import StackingCVClassifier # Initializing models clf1 = KNeighborsClassifier(n_neighbors=1) clf2 = RandomForestClassifier(random_state=RANDOM_SEED) clf3 = GaussianNB() lr = LogisticRegression() # The StackingCVClassifier uses scikit-learn's check_cv # internally, which doesn't support a random seed. Thus # NumPy's random seed need to be specified explicitely for # deterministic behavior np.random.seed(RANDOM_SEED) sclf = StackingCVClassifier(classifiers=[clf1, clf2, clf3], meta_classifier=lr) params = {'kneighborsclassifier__n_neighbors': [1, 5], 'randomforestclassifier__n_estimators': [10, 50], 'meta-logisticregression__C': [0.1, 10.0]} grid = GridSearchCV(estimator=sclf, param_grid=params, cv=5, refit=True) grid.fit(X, y) cv_keys = ('mean_test_score', 'std_test_score', 'params') for r, _ in enumerate(grid.cv_results_['mean_test_score']): print(\"%0.3f +/- %0.2f %r\" % (grid.cv_results_[cv_keys[0]][r], grid.cv_results_[cv_keys[1]][r] / 2.0, grid.cv_results_[cv_keys[2]][r])) print('Best parameters: %s' % grid.best_params_) print('Accuracy: %.2f' % grid.best_score_) 0.673 +/- 0.01 {'kneighborsclassifier__n_neighbors': 1, 'meta-logisticregression__C': 0.1, 'randomforestclassifier__n_estimators': 10} 0.667 +/- 0.00 {'kneighborsclassifier__n_neighbors': 1, 'meta-logisticregression__C': 0.1, 'randomforestclassifier__n_estimators': 50} 0.920 +/- 0.02 {'kneighborsclassifier__n_neighbors': 1, 'meta-logisticregression__C': 10.0, 'randomforestclassifier__n_estimators': 10} 0.893 +/- 0.02 {'kneighborsclassifier__n_neighbors': 1, 'meta-logisticregression__C': 10.0, 'randomforestclassifier__n_estimators': 50} 0.667 +/- 0.00 {'kneighborsclassifier__n_neighbors': 5, 'meta-logisticregression__C': 0.1, 'randomforestclassifier__n_estimators': 10} 0.667 +/- 0.00 {'kneighborsclassifier__n_neighbors': 5, 'meta-logisticregression__C': 0.1, 'randomforestclassifier__n_estimators': 50} 0.947 +/- 0.02 {'kneighborsclassifier__n_neighbors': 5, 'meta-logisticregression__C': 10.0, 'randomforestclassifier__n_estimators': 10} 0.947 +/- 0.02 {'kneighborsclassifier__n_neighbors': 5, 'meta-logisticregression__C': 10.0, 'randomforestclassifier__n_estimators': 50} Best parameters: {'kneighborsclassifier__n_neighbors': 5, 'meta-logisticregression__C': 10.0, 'randomforestclassifier__n_estimators': 10} Accuracy: 0.95 In case we are planning to use a regression algorithm multiple times, all we need to do is to add an additional number suffix in the parameter grid as shown below: from sklearn.model_selection import GridSearchCV # Initializing models clf1 = KNeighborsClassifier(n_neighbors=1) clf2 = RandomForestClassifier(random_state=RANDOM_SEED) clf3 = GaussianNB() lr = LogisticRegression() # The StackingCVClassifier uses scikit-learn's check_cv # internally, which doesn't support a random seed. Thus # NumPy's random seed need to be specified explicitely for # deterministic behavior np.random.seed(RANDOM_SEED) sclf = StackingCVClassifier(classifiers=[clf1, clf1, clf2, clf3], meta_classifier=lr) params = {'kneighborsclassifier-1__n_neighbors': [1, 5], 'kneighborsclassifier-2__n_neighbors': [1, 5], 'randomforestclassifier__n_estimators': [10, 50], 'meta-logisticregression__C': [0.1, 10.0]} grid = GridSearchCV(estimator=sclf, param_grid=params, cv=5, refit=True) grid.fit(X, y) cv_keys = ('mean_test_score', 'std_test_score', 'params') for r, _ in enumerate(grid.cv_results_['mean_test_score']): print(\"%0.3f +/- %0.2f %r\" % (grid.cv_results_[cv_keys[0]][r], grid.cv_results_[cv_keys[1]][r] / 2.0, grid.cv_results_[cv_keys[2]][r])) print('Best parameters: %s' % grid.best_params_) print('Accuracy: %.2f' % grid.best_score_) 0.673 +/- 0.01 {'kneighborsclassifier-1__n_neighbors': 1, 'kneighborsclassifier-2__n_neighbors': 1, 'meta-logisticregression__C': 0.1, 'randomforestclassifier__n_estimators': 10} 0.667 +/- 0.00 {'kneighborsclassifier-1__n_neighbors': 1, 'kneighborsclassifier-2__n_neighbors': 1, 'meta-logisticregression__C': 0.1, 'randomforestclassifier__n_estimators': 50} 0.920 +/- 0.02 {'kneighborsclassifier-1__n_neighbors': 1, 'kneighborsclassifier-2__n_neighbors': 1, 'meta-logisticregression__C': 10.0, 'randomforestclassifier__n_estimators': 10} 0.893 +/- 0.02 {'kneighborsclassifier-1__n_neighbors': 1, 'kneighborsclassifier-2__n_neighbors': 1, 'meta-logisticregression__C': 10.0, 'randomforestclassifier__n_estimators': 50} 0.667 +/- 0.00 {'kneighborsclassifier-1__n_neighbors': 1, 'kneighborsclassifier-2__n_neighbors': 5, 'meta-logisticregression__C': 0.1, 'randomforestclassifier__n_estimators': 10} 0.667 +/- 0.00 {'kneighborsclassifier-1__n_neighbors': 1, 'kneighborsclassifier-2__n_neighbors': 5, 'meta-logisticregression__C': 0.1, 'randomforestclassifier__n_estimators': 50} 0.947 +/- 0.02 {'kneighborsclassifier-1__n_neighbors': 1, 'kneighborsclassifier-2__n_neighbors': 5, 'meta-logisticregression__C': 10.0, 'randomforestclassifier__n_estimators': 10} 0.940 +/- 0.02 {'kneighborsclassifier-1__n_neighbors': 1, 'kneighborsclassifier-2__n_neighbors': 5, 'meta-logisticregression__C': 10.0, 'randomforestclassifier__n_estimators': 50} 0.667 +/- 0.00 {'kneighborsclassifier-1__n_neighbors': 5, 'kneighborsclassifier-2__n_neighbors': 1, 'meta-logisticregression__C': 0.1, 'randomforestclassifier__n_estimators': 10} 0.667 +/- 0.00 {'kneighborsclassifier-1__n_neighbors': 5, 'kneighborsclassifier-2__n_neighbors': 1, 'meta-logisticregression__C': 0.1, 'randomforestclassifier__n_estimators': 50} 0.953 +/- 0.02 {'kneighborsclassifier-1__n_neighbors': 5, 'kneighborsclassifier-2__n_neighbors': 1, 'meta-logisticregression__C': 10.0, 'randomforestclassifier__n_estimators': 10} 0.927 +/- 0.03 {'kneighborsclassifier-1__n_neighbors': 5, 'kneighborsclassifier-2__n_neighbors': 1, 'meta-logisticregression__C': 10.0, 'randomforestclassifier__n_estimators': 50} 0.667 +/- 0.00 {'kneighborsclassifier-1__n_neighbors': 5, 'kneighborsclassifier-2__n_neighbors': 5, 'meta-logisticregression__C': 0.1, 'randomforestclassifier__n_estimators': 10} 0.667 +/- 0.00 {'kneighborsclassifier-1__n_neighbors': 5, 'kneighborsclassifier-2__n_neighbors': 5, 'meta-logisticregression__C': 0.1, 'randomforestclassifier__n_estimators': 50} 0.940 +/- 0.03 {'kneighborsclassifier-1__n_neighbors': 5, 'kneighborsclassifier-2__n_neighbors': 5, 'meta-logisticregression__C': 10.0, 'randomforestclassifier__n_estimators': 10} 0.940 +/- 0.03 {'kneighborsclassifier-1__n_neighbors': 5, 'kneighborsclassifier-2__n_neighbors': 5, 'meta-logisticregression__C': 10.0, 'randomforestclassifier__n_estimators': 50} Best parameters: {'kneighborsclassifier-1__n_neighbors': 5, 'kneighborsclassifier-2__n_neighbors': 1, 'meta-logisticregression__C': 10.0, 'randomforestclassifier__n_estimators': 10} Accuracy: 0.95 Note The StackingCVClassifier also enables grid search over the classifiers argument. However, due to the current implementation of GridSearchCV in scikit-learn, it is not possible to search over both, differenct classifiers and classifier parameters at the same time. For instance, while the following parameter dictionary works params = {'randomforestclassifier__n_estimators': [1, 100], 'classifiers': [(clf1, clf1, clf1), (clf2, clf3)]} it will use the instance settings of clf1 , clf2 , and clf3 and not overwrite it with the 'n_estimators' settings from 'randomforestclassifier__n_estimators': [1, 100] . Example 4 - Stacking of Classifiers that Operate on Different Feature Subsets The different level-1 classifiers can be fit to different subsets of features in the training dataset. The following example illustrates how this can be done on a technical level using scikit-learn pipelines and the ColumnSelector : from sklearn.datasets import load_iris from mlxtend.classifier import StackingCVClassifier from mlxtend.feature_selection import ColumnSelector from sklearn.pipeline import make_pipeline from sklearn.linear_model import LogisticRegression iris = load_iris() X = iris.data y = iris.target pipe1 = make_pipeline(ColumnSelector(cols=(0, 2)), LogisticRegression()) pipe2 = make_pipeline(ColumnSelector(cols=(1, 2, 3)), LogisticRegression()) sclf = StackingCVClassifier(classifiers=[pipe1, pipe2], meta_classifier=LogisticRegression()) sclf.fit(X, y) StackingCVClassifier(classifiers=[Pipeline(steps=[('columnselector', ColumnSelector(cols=(0, 2))), ('logisticregression', LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True, intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1, penalty='l2', random_state=None, solve...='l2', random_state=None, solver='liblinear', tol=0.0001, verbose=0, warm_start=False))])], cv=2, meta_classifier=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True, intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1, penalty='l2', random_state=None, solver='liblinear', tol=0.0001, verbose=0, warm_start=False), shuffle=True, stratify=True, use_features_in_secondary=False, use_probas=False, verbose=0) API StackingCVClassifier(classifiers, meta_classifier, use_probas=False, cv=2, use_features_in_secondary=False, stratify=True, shuffle=True, verbose=0, store_train_meta_features=False, use_clones=True) A 'Stacking Cross-Validation' classifier for scikit-learn estimators. New in mlxtend v0.4.3 Notes The StackingCVClassifier uses scikit-learn's check_cv internally, which doesn't support a random seed. Thus NumPy's random seed need to be specified explicitely for deterministic behavior, for instance, by setting np.random.seed(RANDOM_SEED) prior to fitting the StackingCVClassifier Parameters classifiers : array-like, shape = [n_classifiers] A list of classifiers. Invoking the fit method on the StackingCVClassifer will fit clones of these original classifiers that will be stored in the class attribute self.clfs_ . meta_classifier : object The meta-classifier to be fitted on the ensemble of classifiers use_probas : bool (default: False) If True, trains meta-classifier based on predicted probabilities instead of class labels. cv : int, cross-validation generator or an iterable, optional (default: 2) Determines the cross-validation splitting strategy. Possible inputs for cv are: - None, to use the default 2-fold cross validation, - integer, to specify the number of folds in a (Stratified)KFold , - An object to be used as a cross-validation generator. - An iterable yielding train, test splits. For integer/None inputs, it will use either a KFold or StratifiedKFold cross validation depending the value of stratify argument. use_features_in_secondary : bool (default: False) If True, the meta-classifier will be trained both on the predictions of the original classifiers and the original dataset. If False, the meta-classifier will be trained only on the predictions of the original classifiers. stratify : bool (default: True) If True, and the cv argument is integer it will follow a stratified K-Fold cross validation technique. If the cv argument is a specific cross validation technique, this argument is omitted. shuffle : bool (default: True) If True, and the cv argument is integer, the training data will be shuffled at fitting stage prior to cross-validation. If the cv argument is a specific cross validation technique, this argument is omitted. verbose : int, optional (default=0) Controls the verbosity of the building process. - verbose=0 (default): Prints nothing - verbose=1 : Prints the number & name of the regressor being fitted and which fold is currently being used for fitting - verbose=2 : Prints info about the parameters of the regressor being fitted - verbose>2 : Changes verbose param of the underlying regressor to self.verbose - 2 store_train_meta_features : bool (default: False) If True, the meta-features computed from the training data used for fitting the meta-classifier stored in the self.train_meta_features_ array, which can be accessed after calling fit . use_clones : bool (default: True) Clones the classifiers for stacking classification if True (default) or else uses the original ones, which will be refitted on the dataset upon calling the fit method. Hence, if use_clones=True, the original input classifiers will remain unmodified upon using the StackingCVClassifier's fit method. Setting use_clones=False is recommended if you are working with estimators that are supporting the scikit-learn fit/predict API interface but are not compatible to scikit-learn's clone function. Attributes clfs_ : list, shape=[n_classifiers] Fitted classifiers (clones of the original classifiers) meta_clf_ : estimator Fitted meta-classifier (clone of the original meta-estimator) train_meta_features : numpy array, shape = [n_samples, n_classifiers] meta-features for training data, where n_samples is the number of samples in training data and n_classifiers is the number of classfiers. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/classifier/StackingCVClassifier/ Methods fit(X, y, groups=None, sample_weight=None) Fit ensemble classifers and the meta-classifier. Parameters X : numpy array, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : numpy array, shape = [n_samples] Target values. groups : numpy array/None, shape = [n_samples] The group that each sample belongs to. This is used by specific folding strategies such as GroupKFold() sample_weight : array-like, shape = [n_samples], optional Sample weights passed as sample_weights to each regressor in the regressors list as well as the meta_regressor. Raises error if some regressor does not support sample_weight in the fit() method. Returns self : object fit_transform(X, y=None, fit_params) Fit to data, then transform it. Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X. Parameters X : numpy array of shape [n_samples, n_features] Training set. y : numpy array of shape [n_samples] Target values. Returns X_new : numpy array of shape [n_samples, n_features_new] Transformed array. get_params(deep=True) Return estimator parameter names for GridSearch support. predict(X) Predict target values for X. Parameters X : numpy array, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns labels : array-like, shape = [n_samples] Predicted class labels. predict_meta_features(X) Get meta-features of test-data. Parameters X : numpy array, shape = [n_samples, n_features] Test vectors, where n_samples is the number of samples and n_features is the number of features. Returns meta-features : numpy array, shape = [n_samples, n_classifiers] Returns the meta-features for test data. predict_proba(X) Predict class probabilities for X. Parameters X : numpy array, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns proba : array-like, shape = [n_samples, n_classes] Probability for each class per sample. score(X, y, sample_weight=None) Returns the mean accuracy on the given test data and labels. In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted. Parameters X : array-like, shape = (n_samples, n_features) Test samples. y : array-like, shape = (n_samples) or (n_samples, n_outputs) True labels for X. sample_weight : array-like, shape = [n_samples], optional Sample weights. Returns score : float Mean accuracy of self.predict(X) wrt. y. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self","title":"StackingCVClassifier"},{"location":"user_guide/classifier/StackingCVClassifier/#stackingcvclassifier","text":"An ensemble-learning meta-classifier for stacking using cross-validation to prepare the inputs for the level-2 classifier to prevent overfitting. from mlxtend.classifier import StackingCVClassifier","title":"StackingCVClassifier"},{"location":"user_guide/classifier/StackingCVClassifier/#overview","text":"Stacking is an ensemble learning technique to combine multiple classification models via a meta-classifier. The StackingCVClassifier extends the standard stacking algorithm (implemented as StackingClassifier ) using cross-validation to prepare the input data for the level-2 classifier. In the standard stacking procedure, the first-level classifiers are fit to the same training set that is used prepare the inputs for the second-level classifier, which may lead to overfitting. The StackingCVClassifier , however, uses the concept of cross-validation: the dataset is split into k folds, and in k successive rounds, k-1 folds are used to fit the first level classifier; in each round, the first-level classifiers are then applied to the remaining 1 subset that was not used for model fitting in each iteration. The resulting predictions are then stacked and provided -- as input data -- to the second-level classifier. After the training of the StackingCVClassifier , the first-level classifiers are fit to the entire dataset as illustrated in the figure below. More formally, the Stacking Cross-Validation algorithm can be summarized as follows (source: [1]):","title":"Overview"},{"location":"user_guide/classifier/StackingCVClassifier/#references","text":"[1] Tang, J., S. Alelyani, and H. Liu. \" Data Classification: Algorithms and Applications. \" Data Mining and Knowledge Discovery Series, CRC Press (2015): pp. 498-500. [2] Wolpert, David H. \" Stacked generalization. \" Neural networks 5.2 (1992): 241-259.","title":"References"},{"location":"user_guide/classifier/StackingCVClassifier/#example-1-simple-stacking-cv-classification","text":"from sklearn import datasets iris = datasets.load_iris() X, y = iris.data[:, 1:3], iris.target from sklearn import model_selection from sklearn.linear_model import LogisticRegression from sklearn.neighbors import KNeighborsClassifier from sklearn.naive_bayes import GaussianNB from sklearn.ensemble import RandomForestClassifier from mlxtend.classifier import StackingCVClassifier import numpy as np RANDOM_SEED = 42 clf1 = KNeighborsClassifier(n_neighbors=1) clf2 = RandomForestClassifier(random_state=RANDOM_SEED) clf3 = GaussianNB() lr = LogisticRegression() # The StackingCVClassifier uses scikit-learn's check_cv # internally, which doesn't support a random seed. Thus # NumPy's random seed need to be specified explicitely for # deterministic behavior np.random.seed(RANDOM_SEED) sclf = StackingCVClassifier(classifiers=[clf1, clf2, clf3], meta_classifier=lr) print('3-fold cross validation:\\n') for clf, label in zip([clf1, clf2, clf3, sclf], ['KNN', 'Random Forest', 'Naive Bayes', 'StackingClassifier']): scores = model_selection.cross_val_score(clf, X, y, cv=3, scoring='accuracy') print(\"Accuracy: %0.2f (+/- %0.2f) [%s]\" % (scores.mean(), scores.std(), label)) 3-fold cross validation: Accuracy: 0.91 (+/- 0.01) [KNN] Accuracy: 0.90 (+/- 0.03) [Random Forest] Accuracy: 0.92 (+/- 0.03) [Naive Bayes] Accuracy: 0.93 (+/- 0.02) [StackingClassifier] import matplotlib.pyplot as plt from mlxtend.plotting import plot_decision_regions import matplotlib.gridspec as gridspec import itertools gs = gridspec.GridSpec(2, 2) fig = plt.figure(figsize=(10,8)) for clf, lab, grd in zip([clf1, clf2, clf3, sclf], ['KNN', 'Random Forest', 'Naive Bayes', 'StackingCVClassifier'], itertools.product([0, 1], repeat=2)): clf.fit(X, y) ax = plt.subplot(gs[grd[0], grd[1]]) fig = plot_decision_regions(X=X, y=y, clf=clf) plt.title(lab) plt.show()","title":"Example 1 - Simple Stacking CV Classification"},{"location":"user_guide/classifier/StackingCVClassifier/#example-2-using-probabilities-as-meta-features","text":"Alternatively, the class-probabilities of the first-level classifiers can be used to train the meta-classifier (2nd-level classifier) by setting use_probas=True . For example, in a 3-class setting with 2 level-1 classifiers, these classifiers may make the following \"probability\" predictions for 1 training sample: classifier 1: [0.2, 0.5, 0.3] classifier 2: [0.3, 0.4, 0.4] This results in k features, where k = [n_classes * n_classifiers], by stacking these level-1 probabilities: [0.2, 0.5, 0.3, 0.3, 0.4, 0.4] clf1 = KNeighborsClassifier(n_neighbors=1) clf2 = RandomForestClassifier(random_state=1) clf3 = GaussianNB() lr = LogisticRegression() # The StackingCVClassifier uses scikit-learn's check_cv # internally, which doesn't support a random seed. Thus # NumPy's random seed need to be specified explicitely for # deterministic behavior np.random.seed(RANDOM_SEED) sclf = StackingCVClassifier(classifiers=[clf1, clf2, clf3], use_probas=True, meta_classifier=lr) print('3-fold cross validation:\\n') for clf, label in zip([clf1, clf2, clf3, sclf], ['KNN', 'Random Forest', 'Naive Bayes', 'StackingClassifier']): scores = model_selection.cross_val_score(clf, X, y, cv=3, scoring='accuracy') print(\"Accuracy: %0.2f (+/- %0.2f) [%s]\" % (scores.mean(), scores.std(), label)) 3-fold cross validation: Accuracy: 0.91 (+/- 0.01) [KNN] Accuracy: 0.91 (+/- 0.06) [Random Forest] Accuracy: 0.92 (+/- 0.03) [Naive Bayes] Accuracy: 0.95 (+/- 0.04) [StackingClassifier]","title":"Example 2 - Using Probabilities as Meta-Features"},{"location":"user_guide/classifier/StackingCVClassifier/#example-3-stacked-cv-classification-and-gridsearch","text":"To set up a parameter grid for scikit-learn's GridSearch , we simply provide the estimator's names in the parameter grid -- in the special case of the meta-regressor, we append the 'meta-' prefix. from sklearn.linear_model import LogisticRegression from sklearn.neighbors import KNeighborsClassifier from sklearn.naive_bayes import GaussianNB from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import GridSearchCV from mlxtend.classifier import StackingCVClassifier # Initializing models clf1 = KNeighborsClassifier(n_neighbors=1) clf2 = RandomForestClassifier(random_state=RANDOM_SEED) clf3 = GaussianNB() lr = LogisticRegression() # The StackingCVClassifier uses scikit-learn's check_cv # internally, which doesn't support a random seed. Thus # NumPy's random seed need to be specified explicitely for # deterministic behavior np.random.seed(RANDOM_SEED) sclf = StackingCVClassifier(classifiers=[clf1, clf2, clf3], meta_classifier=lr) params = {'kneighborsclassifier__n_neighbors': [1, 5], 'randomforestclassifier__n_estimators': [10, 50], 'meta-logisticregression__C': [0.1, 10.0]} grid = GridSearchCV(estimator=sclf, param_grid=params, cv=5, refit=True) grid.fit(X, y) cv_keys = ('mean_test_score', 'std_test_score', 'params') for r, _ in enumerate(grid.cv_results_['mean_test_score']): print(\"%0.3f +/- %0.2f %r\" % (grid.cv_results_[cv_keys[0]][r], grid.cv_results_[cv_keys[1]][r] / 2.0, grid.cv_results_[cv_keys[2]][r])) print('Best parameters: %s' % grid.best_params_) print('Accuracy: %.2f' % grid.best_score_) 0.673 +/- 0.01 {'kneighborsclassifier__n_neighbors': 1, 'meta-logisticregression__C': 0.1, 'randomforestclassifier__n_estimators': 10} 0.667 +/- 0.00 {'kneighborsclassifier__n_neighbors': 1, 'meta-logisticregression__C': 0.1, 'randomforestclassifier__n_estimators': 50} 0.920 +/- 0.02 {'kneighborsclassifier__n_neighbors': 1, 'meta-logisticregression__C': 10.0, 'randomforestclassifier__n_estimators': 10} 0.893 +/- 0.02 {'kneighborsclassifier__n_neighbors': 1, 'meta-logisticregression__C': 10.0, 'randomforestclassifier__n_estimators': 50} 0.667 +/- 0.00 {'kneighborsclassifier__n_neighbors': 5, 'meta-logisticregression__C': 0.1, 'randomforestclassifier__n_estimators': 10} 0.667 +/- 0.00 {'kneighborsclassifier__n_neighbors': 5, 'meta-logisticregression__C': 0.1, 'randomforestclassifier__n_estimators': 50} 0.947 +/- 0.02 {'kneighborsclassifier__n_neighbors': 5, 'meta-logisticregression__C': 10.0, 'randomforestclassifier__n_estimators': 10} 0.947 +/- 0.02 {'kneighborsclassifier__n_neighbors': 5, 'meta-logisticregression__C': 10.0, 'randomforestclassifier__n_estimators': 50} Best parameters: {'kneighborsclassifier__n_neighbors': 5, 'meta-logisticregression__C': 10.0, 'randomforestclassifier__n_estimators': 10} Accuracy: 0.95 In case we are planning to use a regression algorithm multiple times, all we need to do is to add an additional number suffix in the parameter grid as shown below: from sklearn.model_selection import GridSearchCV # Initializing models clf1 = KNeighborsClassifier(n_neighbors=1) clf2 = RandomForestClassifier(random_state=RANDOM_SEED) clf3 = GaussianNB() lr = LogisticRegression() # The StackingCVClassifier uses scikit-learn's check_cv # internally, which doesn't support a random seed. Thus # NumPy's random seed need to be specified explicitely for # deterministic behavior np.random.seed(RANDOM_SEED) sclf = StackingCVClassifier(classifiers=[clf1, clf1, clf2, clf3], meta_classifier=lr) params = {'kneighborsclassifier-1__n_neighbors': [1, 5], 'kneighborsclassifier-2__n_neighbors': [1, 5], 'randomforestclassifier__n_estimators': [10, 50], 'meta-logisticregression__C': [0.1, 10.0]} grid = GridSearchCV(estimator=sclf, param_grid=params, cv=5, refit=True) grid.fit(X, y) cv_keys = ('mean_test_score', 'std_test_score', 'params') for r, _ in enumerate(grid.cv_results_['mean_test_score']): print(\"%0.3f +/- %0.2f %r\" % (grid.cv_results_[cv_keys[0]][r], grid.cv_results_[cv_keys[1]][r] / 2.0, grid.cv_results_[cv_keys[2]][r])) print('Best parameters: %s' % grid.best_params_) print('Accuracy: %.2f' % grid.best_score_) 0.673 +/- 0.01 {'kneighborsclassifier-1__n_neighbors': 1, 'kneighborsclassifier-2__n_neighbors': 1, 'meta-logisticregression__C': 0.1, 'randomforestclassifier__n_estimators': 10} 0.667 +/- 0.00 {'kneighborsclassifier-1__n_neighbors': 1, 'kneighborsclassifier-2__n_neighbors': 1, 'meta-logisticregression__C': 0.1, 'randomforestclassifier__n_estimators': 50} 0.920 +/- 0.02 {'kneighborsclassifier-1__n_neighbors': 1, 'kneighborsclassifier-2__n_neighbors': 1, 'meta-logisticregression__C': 10.0, 'randomforestclassifier__n_estimators': 10} 0.893 +/- 0.02 {'kneighborsclassifier-1__n_neighbors': 1, 'kneighborsclassifier-2__n_neighbors': 1, 'meta-logisticregression__C': 10.0, 'randomforestclassifier__n_estimators': 50} 0.667 +/- 0.00 {'kneighborsclassifier-1__n_neighbors': 1, 'kneighborsclassifier-2__n_neighbors': 5, 'meta-logisticregression__C': 0.1, 'randomforestclassifier__n_estimators': 10} 0.667 +/- 0.00 {'kneighborsclassifier-1__n_neighbors': 1, 'kneighborsclassifier-2__n_neighbors': 5, 'meta-logisticregression__C': 0.1, 'randomforestclassifier__n_estimators': 50} 0.947 +/- 0.02 {'kneighborsclassifier-1__n_neighbors': 1, 'kneighborsclassifier-2__n_neighbors': 5, 'meta-logisticregression__C': 10.0, 'randomforestclassifier__n_estimators': 10} 0.940 +/- 0.02 {'kneighborsclassifier-1__n_neighbors': 1, 'kneighborsclassifier-2__n_neighbors': 5, 'meta-logisticregression__C': 10.0, 'randomforestclassifier__n_estimators': 50} 0.667 +/- 0.00 {'kneighborsclassifier-1__n_neighbors': 5, 'kneighborsclassifier-2__n_neighbors': 1, 'meta-logisticregression__C': 0.1, 'randomforestclassifier__n_estimators': 10} 0.667 +/- 0.00 {'kneighborsclassifier-1__n_neighbors': 5, 'kneighborsclassifier-2__n_neighbors': 1, 'meta-logisticregression__C': 0.1, 'randomforestclassifier__n_estimators': 50} 0.953 +/- 0.02 {'kneighborsclassifier-1__n_neighbors': 5, 'kneighborsclassifier-2__n_neighbors': 1, 'meta-logisticregression__C': 10.0, 'randomforestclassifier__n_estimators': 10} 0.927 +/- 0.03 {'kneighborsclassifier-1__n_neighbors': 5, 'kneighborsclassifier-2__n_neighbors': 1, 'meta-logisticregression__C': 10.0, 'randomforestclassifier__n_estimators': 50} 0.667 +/- 0.00 {'kneighborsclassifier-1__n_neighbors': 5, 'kneighborsclassifier-2__n_neighbors': 5, 'meta-logisticregression__C': 0.1, 'randomforestclassifier__n_estimators': 10} 0.667 +/- 0.00 {'kneighborsclassifier-1__n_neighbors': 5, 'kneighborsclassifier-2__n_neighbors': 5, 'meta-logisticregression__C': 0.1, 'randomforestclassifier__n_estimators': 50} 0.940 +/- 0.03 {'kneighborsclassifier-1__n_neighbors': 5, 'kneighborsclassifier-2__n_neighbors': 5, 'meta-logisticregression__C': 10.0, 'randomforestclassifier__n_estimators': 10} 0.940 +/- 0.03 {'kneighborsclassifier-1__n_neighbors': 5, 'kneighborsclassifier-2__n_neighbors': 5, 'meta-logisticregression__C': 10.0, 'randomforestclassifier__n_estimators': 50} Best parameters: {'kneighborsclassifier-1__n_neighbors': 5, 'kneighborsclassifier-2__n_neighbors': 1, 'meta-logisticregression__C': 10.0, 'randomforestclassifier__n_estimators': 10} Accuracy: 0.95 Note The StackingCVClassifier also enables grid search over the classifiers argument. However, due to the current implementation of GridSearchCV in scikit-learn, it is not possible to search over both, differenct classifiers and classifier parameters at the same time. For instance, while the following parameter dictionary works params = {'randomforestclassifier__n_estimators': [1, 100], 'classifiers': [(clf1, clf1, clf1), (clf2, clf3)]} it will use the instance settings of clf1 , clf2 , and clf3 and not overwrite it with the 'n_estimators' settings from 'randomforestclassifier__n_estimators': [1, 100] .","title":"Example 3 - Stacked CV Classification and GridSearch"},{"location":"user_guide/classifier/StackingCVClassifier/#example-4-stacking-of-classifiers-that-operate-on-different-feature-subsets","text":"The different level-1 classifiers can be fit to different subsets of features in the training dataset. The following example illustrates how this can be done on a technical level using scikit-learn pipelines and the ColumnSelector : from sklearn.datasets import load_iris from mlxtend.classifier import StackingCVClassifier from mlxtend.feature_selection import ColumnSelector from sklearn.pipeline import make_pipeline from sklearn.linear_model import LogisticRegression iris = load_iris() X = iris.data y = iris.target pipe1 = make_pipeline(ColumnSelector(cols=(0, 2)), LogisticRegression()) pipe2 = make_pipeline(ColumnSelector(cols=(1, 2, 3)), LogisticRegression()) sclf = StackingCVClassifier(classifiers=[pipe1, pipe2], meta_classifier=LogisticRegression()) sclf.fit(X, y) StackingCVClassifier(classifiers=[Pipeline(steps=[('columnselector', ColumnSelector(cols=(0, 2))), ('logisticregression', LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True, intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1, penalty='l2', random_state=None, solve...='l2', random_state=None, solver='liblinear', tol=0.0001, verbose=0, warm_start=False))])], cv=2, meta_classifier=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True, intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1, penalty='l2', random_state=None, solver='liblinear', tol=0.0001, verbose=0, warm_start=False), shuffle=True, stratify=True, use_features_in_secondary=False, use_probas=False, verbose=0)","title":"Example 4 - Stacking of Classifiers that Operate on Different Feature Subsets"},{"location":"user_guide/classifier/StackingCVClassifier/#api","text":"StackingCVClassifier(classifiers, meta_classifier, use_probas=False, cv=2, use_features_in_secondary=False, stratify=True, shuffle=True, verbose=0, store_train_meta_features=False, use_clones=True) A 'Stacking Cross-Validation' classifier for scikit-learn estimators. New in mlxtend v0.4.3 Notes The StackingCVClassifier uses scikit-learn's check_cv internally, which doesn't support a random seed. Thus NumPy's random seed need to be specified explicitely for deterministic behavior, for instance, by setting np.random.seed(RANDOM_SEED) prior to fitting the StackingCVClassifier Parameters classifiers : array-like, shape = [n_classifiers] A list of classifiers. Invoking the fit method on the StackingCVClassifer will fit clones of these original classifiers that will be stored in the class attribute self.clfs_ . meta_classifier : object The meta-classifier to be fitted on the ensemble of classifiers use_probas : bool (default: False) If True, trains meta-classifier based on predicted probabilities instead of class labels. cv : int, cross-validation generator or an iterable, optional (default: 2) Determines the cross-validation splitting strategy. Possible inputs for cv are: - None, to use the default 2-fold cross validation, - integer, to specify the number of folds in a (Stratified)KFold , - An object to be used as a cross-validation generator. - An iterable yielding train, test splits. For integer/None inputs, it will use either a KFold or StratifiedKFold cross validation depending the value of stratify argument. use_features_in_secondary : bool (default: False) If True, the meta-classifier will be trained both on the predictions of the original classifiers and the original dataset. If False, the meta-classifier will be trained only on the predictions of the original classifiers. stratify : bool (default: True) If True, and the cv argument is integer it will follow a stratified K-Fold cross validation technique. If the cv argument is a specific cross validation technique, this argument is omitted. shuffle : bool (default: True) If True, and the cv argument is integer, the training data will be shuffled at fitting stage prior to cross-validation. If the cv argument is a specific cross validation technique, this argument is omitted. verbose : int, optional (default=0) Controls the verbosity of the building process. - verbose=0 (default): Prints nothing - verbose=1 : Prints the number & name of the regressor being fitted and which fold is currently being used for fitting - verbose=2 : Prints info about the parameters of the regressor being fitted - verbose>2 : Changes verbose param of the underlying regressor to self.verbose - 2 store_train_meta_features : bool (default: False) If True, the meta-features computed from the training data used for fitting the meta-classifier stored in the self.train_meta_features_ array, which can be accessed after calling fit . use_clones : bool (default: True) Clones the classifiers for stacking classification if True (default) or else uses the original ones, which will be refitted on the dataset upon calling the fit method. Hence, if use_clones=True, the original input classifiers will remain unmodified upon using the StackingCVClassifier's fit method. Setting use_clones=False is recommended if you are working with estimators that are supporting the scikit-learn fit/predict API interface but are not compatible to scikit-learn's clone function. Attributes clfs_ : list, shape=[n_classifiers] Fitted classifiers (clones of the original classifiers) meta_clf_ : estimator Fitted meta-classifier (clone of the original meta-estimator) train_meta_features : numpy array, shape = [n_samples, n_classifiers] meta-features for training data, where n_samples is the number of samples in training data and n_classifiers is the number of classfiers. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/classifier/StackingCVClassifier/","title":"API"},{"location":"user_guide/classifier/StackingCVClassifier/#methods","text":"fit(X, y, groups=None, sample_weight=None) Fit ensemble classifers and the meta-classifier. Parameters X : numpy array, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : numpy array, shape = [n_samples] Target values. groups : numpy array/None, shape = [n_samples] The group that each sample belongs to. This is used by specific folding strategies such as GroupKFold() sample_weight : array-like, shape = [n_samples], optional Sample weights passed as sample_weights to each regressor in the regressors list as well as the meta_regressor. Raises error if some regressor does not support sample_weight in the fit() method. Returns self : object fit_transform(X, y=None, fit_params) Fit to data, then transform it. Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X. Parameters X : numpy array of shape [n_samples, n_features] Training set. y : numpy array of shape [n_samples] Target values. Returns X_new : numpy array of shape [n_samples, n_features_new] Transformed array. get_params(deep=True) Return estimator parameter names for GridSearch support. predict(X) Predict target values for X. Parameters X : numpy array, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns labels : array-like, shape = [n_samples] Predicted class labels. predict_meta_features(X) Get meta-features of test-data. Parameters X : numpy array, shape = [n_samples, n_features] Test vectors, where n_samples is the number of samples and n_features is the number of features. Returns meta-features : numpy array, shape = [n_samples, n_classifiers] Returns the meta-features for test data. predict_proba(X) Predict class probabilities for X. Parameters X : numpy array, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns proba : array-like, shape = [n_samples, n_classes] Probability for each class per sample. score(X, y, sample_weight=None) Returns the mean accuracy on the given test data and labels. In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted. Parameters X : array-like, shape = (n_samples, n_features) Test samples. y : array-like, shape = (n_samples) or (n_samples, n_outputs) True labels for X. sample_weight : array-like, shape = [n_samples], optional Sample weights. Returns score : float Mean accuracy of self.predict(X) wrt. y. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self","title":"Methods"},{"location":"user_guide/classifier/StackingClassifier/","text":"StackingClassifier An ensemble-learning meta-classifier for stacking. from mlxtend.classifier import StackingClassifier Overview Stacking is an ensemble learning technique to combine multiple classification models via a meta-classifier. The individual classification models are trained based on the complete training set; then, the meta-classifier is fitted based on the outputs -- meta-features -- of the individual classification models in the ensemble. The meta-classifier can either be trained on the predicted class labels or probabilities from the ensemble. The algorithm can be summarized as follows (source: [1]): References [1] Tang, J., S. Alelyani, and H. Liu. \" Data Classification: Algorithms and Applications. \" Data Mining and Knowledge Discovery Series, CRC Press (2015): pp. 498-500. [2] Wolpert, David H. \" Stacked generalization. \" Neural networks 5.2 (1992): 241-259. Example 1 - Simple Stacked Classification from sklearn import datasets iris = datasets.load_iris() X, y = iris.data[:, 1:3], iris.target from sklearn import model_selection from sklearn.linear_model import LogisticRegression from sklearn.neighbors import KNeighborsClassifier from sklearn.naive_bayes import GaussianNB from sklearn.ensemble import RandomForestClassifier from mlxtend.classifier import StackingClassifier import numpy as np clf1 = KNeighborsClassifier(n_neighbors=1) clf2 = RandomForestClassifier(random_state=1) clf3 = GaussianNB() lr = LogisticRegression() sclf = StackingClassifier(classifiers=[clf1, clf2, clf3], meta_classifier=lr) print('3-fold cross validation:\\n') for clf, label in zip([clf1, clf2, clf3, sclf], ['KNN', 'Random Forest', 'Naive Bayes', 'StackingClassifier']): scores = model_selection.cross_val_score(clf, X, y, cv=3, scoring='accuracy') print(\"Accuracy: %0.2f (+/- %0.2f) [%s]\" % (scores.mean(), scores.std(), label)) 3-fold cross validation: Accuracy: 0.91 (+/- 0.01) [KNN] Accuracy: 0.91 (+/- 0.06) [Random Forest] Accuracy: 0.92 (+/- 0.03) [Naive Bayes] Accuracy: 0.95 (+/- 0.03) [StackingClassifier] import matplotlib.pyplot as plt from mlxtend.plotting import plot_decision_regions import matplotlib.gridspec as gridspec import itertools gs = gridspec.GridSpec(2, 2) fig = plt.figure(figsize=(10,8)) for clf, lab, grd in zip([clf1, clf2, clf3, sclf], ['KNN', 'Random Forest', 'Naive Bayes', 'StackingClassifier'], itertools.product([0, 1], repeat=2)): clf.fit(X, y) ax = plt.subplot(gs[grd[0], grd[1]]) fig = plot_decision_regions(X=X, y=y, clf=clf) plt.title(lab) Example 2 - Using Probabilities as Meta-Features Alternatively, the class-probabilities of the first-level classifiers can be used to train the meta-classifier (2nd-level classifier) by setting use_probas=True . If average_probas=True , the probabilities of the level-1 classifiers are averaged, if average_probas=False , the probabilities are stacked (recommended). For example, in a 3-class setting with 2 level-1 classifiers, these classifiers may make the following \"probability\" predictions for 1 training sample: classifier 1: [0.2, 0.5, 0.3] classifier 2: [0.3, 0.4, 0.4] If average_probas=True , the meta-features would be: [0.25, 0.45, 0.35] In contrast, using average_probas=False results in k features where, k = [n_classes * n_classifiers], by stacking these level-1 probabilities: [0.2, 0.5, 0.3, 0.3, 0.4, 0.4] clf1 = KNeighborsClassifier(n_neighbors=1) clf2 = RandomForestClassifier(random_state=1) clf3 = GaussianNB() lr = LogisticRegression() sclf = StackingClassifier(classifiers=[clf1, clf2, clf3], use_probas=True, average_probas=False, meta_classifier=lr) print('3-fold cross validation:\\n') for clf, label in zip([clf1, clf2, clf3, sclf], ['KNN', 'Random Forest', 'Naive Bayes', 'StackingClassifier']): scores = model_selection.cross_val_score(clf, X, y, cv=3, scoring='accuracy') print(\"Accuracy: %0.2f (+/- %0.2f) [%s]\" % (scores.mean(), scores.std(), label)) 3-fold cross validation: Accuracy: 0.91 (+/- 0.01) [KNN] Accuracy: 0.91 (+/- 0.06) [Random Forest] Accuracy: 0.92 (+/- 0.03) [Naive Bayes] Accuracy: 0.94 (+/- 0.03) [StackingClassifier] Example 3 - Stacked Classification and GridSearch To set up a parameter grid for scikit-learn's GridSearch , we simply provide the estimator's names in the parameter grid -- in the special case of the meta-regressor, we append the 'meta-' prefix. from sklearn.linear_model import LogisticRegression from sklearn.neighbors import KNeighborsClassifier from sklearn.naive_bayes import GaussianNB from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import GridSearchCV from mlxtend.classifier import StackingClassifier # Initializing models clf1 = KNeighborsClassifier(n_neighbors=1) clf2 = RandomForestClassifier(random_state=1) clf3 = GaussianNB() lr = LogisticRegression() sclf = StackingClassifier(classifiers=[clf1, clf2, clf3], meta_classifier=lr) params = {'kneighborsclassifier__n_neighbors': [1, 5], 'randomforestclassifier__n_estimators': [10, 50], 'meta-logisticregression__C': [0.1, 10.0]} grid = GridSearchCV(estimator=sclf, param_grid=params, cv=5, refit=True) grid.fit(X, y) cv_keys = ('mean_test_score', 'std_test_score', 'params') for r, _ in enumerate(grid.cv_results_['mean_test_score']): print(\"%0.3f +/- %0.2f %r\" % (grid.cv_results_[cv_keys[0]][r], grid.cv_results_[cv_keys[1]][r] / 2.0, grid.cv_results_[cv_keys[2]][r])) print('Best parameters: %s' % grid.best_params_) print('Accuracy: %.2f' % grid.best_score_) 0.667 +/- 0.00 {'kneighborsclassifier__n_neighbors': 1, 'meta-logisticregression__C': 0.1, 'randomforestclassifier__n_estimators': 10} 0.667 +/- 0.00 {'kneighborsclassifier__n_neighbors': 1, 'meta-logisticregression__C': 0.1, 'randomforestclassifier__n_estimators': 50} 0.927 +/- 0.02 {'kneighborsclassifier__n_neighbors': 1, 'meta-logisticregression__C': 10.0, 'randomforestclassifier__n_estimators': 10} 0.913 +/- 0.03 {'kneighborsclassifier__n_neighbors': 1, 'meta-logisticregression__C': 10.0, 'randomforestclassifier__n_estimators': 50} 0.667 +/- 0.00 {'kneighborsclassifier__n_neighbors': 5, 'meta-logisticregression__C': 0.1, 'randomforestclassifier__n_estimators': 10} 0.667 +/- 0.00 {'kneighborsclassifier__n_neighbors': 5, 'meta-logisticregression__C': 0.1, 'randomforestclassifier__n_estimators': 50} 0.933 +/- 0.02 {'kneighborsclassifier__n_neighbors': 5, 'meta-logisticregression__C': 10.0, 'randomforestclassifier__n_estimators': 10} 0.940 +/- 0.02 {'kneighborsclassifier__n_neighbors': 5, 'meta-logisticregression__C': 10.0, 'randomforestclassifier__n_estimators': 50} Best parameters: {'kneighborsclassifier__n_neighbors': 5, 'meta-logisticregression__C': 10.0, 'randomforestclassifier__n_estimators': 50} Accuracy: 0.94 In case we are planning to use a regression algorithm multiple times, all we need to do is to add an additional number suffix in the parameter grid as shown below: from sklearn.model_selection import GridSearchCV # Initializing models clf1 = KNeighborsClassifier(n_neighbors=1) clf2 = RandomForestClassifier(random_state=1) clf3 = GaussianNB() lr = LogisticRegression() sclf = StackingClassifier(classifiers=[clf1, clf1, clf2, clf3], meta_classifier=lr) params = {'kneighborsclassifier-1__n_neighbors': [1, 5], 'kneighborsclassifier-2__n_neighbors': [1, 5], 'randomforestclassifier__n_estimators': [10, 50], 'meta-logisticregression__C': [0.1, 10.0]} grid = GridSearchCV(estimator=sclf, param_grid=params, cv=5, refit=True) grid.fit(X, y) cv_keys = ('mean_test_score', 'std_test_score', 'params') for r, _ in enumerate(grid.cv_results_['mean_test_score']): print(\"%0.3f +/- %0.2f %r\" % (grid.cv_results_[cv_keys[0]][r], grid.cv_results_[cv_keys[1]][r] / 2.0, grid.cv_results_[cv_keys[2]][r])) print('Best parameters: %s' % grid.best_params_) print('Accuracy: %.2f' % grid.best_score_) 0.667 +/- 0.00 {'kneighborsclassifier-1__n_neighbors': 1, 'kneighborsclassifier-2__n_neighbors': 1, 'meta-logisticregression__C': 0.1, 'randomforestclassifier__n_estimators': 10} 0.667 +/- 0.00 {'kneighborsclassifier-1__n_neighbors': 1, 'kneighborsclassifier-2__n_neighbors': 1, 'meta-logisticregression__C': 0.1, 'randomforestclassifier__n_estimators': 50} 0.907 +/- 0.03 {'kneighborsclassifier-1__n_neighbors': 1, 'kneighborsclassifier-2__n_neighbors': 1, 'meta-logisticregression__C': 10.0, 'randomforestclassifier__n_estimators': 10} 0.913 +/- 0.03 {'kneighborsclassifier-1__n_neighbors': 1, 'kneighborsclassifier-2__n_neighbors': 1, 'meta-logisticregression__C': 10.0, 'randomforestclassifier__n_estimators': 50} 0.667 +/- 0.00 {'kneighborsclassifier-1__n_neighbors': 1, 'kneighborsclassifier-2__n_neighbors': 5, 'meta-logisticregression__C': 0.1, 'randomforestclassifier__n_estimators': 10} 0.667 +/- 0.00 {'kneighborsclassifier-1__n_neighbors': 1, 'kneighborsclassifier-2__n_neighbors': 5, 'meta-logisticregression__C': 0.1, 'randomforestclassifier__n_estimators': 50} 0.927 +/- 0.02 {'kneighborsclassifier-1__n_neighbors': 1, 'kneighborsclassifier-2__n_neighbors': 5, 'meta-logisticregression__C': 10.0, 'randomforestclassifier__n_estimators': 10} 0.913 +/- 0.03 {'kneighborsclassifier-1__n_neighbors': 1, 'kneighborsclassifier-2__n_neighbors': 5, 'meta-logisticregression__C': 10.0, 'randomforestclassifier__n_estimators': 50} 0.667 +/- 0.00 {'kneighborsclassifier-1__n_neighbors': 5, 'kneighborsclassifier-2__n_neighbors': 1, 'meta-logisticregression__C': 0.1, 'randomforestclassifier__n_estimators': 10} 0.667 +/- 0.00 {'kneighborsclassifier-1__n_neighbors': 5, 'kneighborsclassifier-2__n_neighbors': 1, 'meta-logisticregression__C': 0.1, 'randomforestclassifier__n_estimators': 50} 0.927 +/- 0.02 {'kneighborsclassifier-1__n_neighbors': 5, 'kneighborsclassifier-2__n_neighbors': 1, 'meta-logisticregression__C': 10.0, 'randomforestclassifier__n_estimators': 10} 0.913 +/- 0.03 {'kneighborsclassifier-1__n_neighbors': 5, 'kneighborsclassifier-2__n_neighbors': 1, 'meta-logisticregression__C': 10.0, 'randomforestclassifier__n_estimators': 50} 0.667 +/- 0.00 {'kneighborsclassifier-1__n_neighbors': 5, 'kneighborsclassifier-2__n_neighbors': 5, 'meta-logisticregression__C': 0.1, 'randomforestclassifier__n_estimators': 10} 0.667 +/- 0.00 {'kneighborsclassifier-1__n_neighbors': 5, 'kneighborsclassifier-2__n_neighbors': 5, 'meta-logisticregression__C': 0.1, 'randomforestclassifier__n_estimators': 50} 0.933 +/- 0.02 {'kneighborsclassifier-1__n_neighbors': 5, 'kneighborsclassifier-2__n_neighbors': 5, 'meta-logisticregression__C': 10.0, 'randomforestclassifier__n_estimators': 10} 0.940 +/- 0.02 {'kneighborsclassifier-1__n_neighbors': 5, 'kneighborsclassifier-2__n_neighbors': 5, 'meta-logisticregression__C': 10.0, 'randomforestclassifier__n_estimators': 50} Best parameters: {'kneighborsclassifier-1__n_neighbors': 5, 'kneighborsclassifier-2__n_neighbors': 5, 'meta-logisticregression__C': 10.0, 'randomforestclassifier__n_estimators': 50} Accuracy: 0.94 Note The StackingClassifier also enables grid search over the classifiers argument. However, due to the current implementation of GridSearchCV in scikit-learn, it is not possible to search over both, differenct classifiers and classifier parameters at the same time. For instance, while the following parameter dictionary works params = {'randomforestclassifier__n_estimators': [1, 100], 'classifiers': [(clf1, clf1, clf1), (clf2, clf3)]} it will use the instance settings of clf1 , clf2 , and clf3 and not overwrite it with the 'n_estimators' settings from 'randomforestclassifier__n_estimators': [1, 100] . Example 4 - Stacking of Classifiers that Operate on Different Feature Subsets The different level-1 classifiers can be fit to different subsets of features in the training dataset. The following example illustrates how this can be done on a technical level using scikit-learn pipelines and the ColumnSelector : from sklearn.datasets import load_iris from mlxtend.classifier import StackingClassifier from mlxtend.feature_selection import ColumnSelector from sklearn.pipeline import make_pipeline from sklearn.linear_model import LogisticRegression iris = load_iris() X = iris.data y = iris.target pipe1 = make_pipeline(ColumnSelector(cols=(0, 2)), LogisticRegression()) pipe2 = make_pipeline(ColumnSelector(cols=(1, 2, 3)), LogisticRegression()) sclf = StackingClassifier(classifiers=[pipe1, pipe2], meta_classifier=LogisticRegression()) sclf.fit(X, y) StackingClassifier(average_probas=False, classifiers=[Pipeline(steps=[('columnselector', ColumnSelector(cols=(0, 2))), ('logisticregression', LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True, intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1, penalty='l2', random_state=None, solve...='l2', random_state=None, solver='liblinear', tol=0.0001, verbose=0, warm_start=False))])], meta_classifier=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True, intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1, penalty='l2', random_state=None, solver='liblinear', tol=0.0001, verbose=0, warm_start=False), use_features_in_secondary=False, use_probas=False, verbose=0) API StackingClassifier(classifiers, meta_classifier, use_probas=False, average_probas=False, verbose=0, use_features_in_secondary=False, store_train_meta_features=False, use_clones=True) A Stacking classifier for scikit-learn estimators for classification. Parameters classifiers : array-like, shape = [n_classifiers] A list of classifiers. Invoking the fit method on the StackingClassifer will fit clones of these original classifiers that will be stored in the class attribute self.clfs_ . meta_classifier : object The meta-classifier to be fitted on the ensemble of classifiers use_probas : bool (default: False) If True, trains meta-classifier based on predicted probabilities instead of class labels. average_probas : bool (default: False) Averages the probabilities as meta features if True. verbose : int, optional (default=0) Controls the verbosity of the building process. - verbose=0 (default): Prints nothing - verbose=1 : Prints the number & name of the regressor being fitted - verbose=2 : Prints info about the parameters of the regressor being fitted - verbose>2 : Changes verbose param of the underlying regressor to self.verbose - 2 use_features_in_secondary : bool (default: False) If True, the meta-classifier will be trained both on the predictions of the original classifiers and the original dataset. If False, the meta-classifier will be trained only on the predictions of the original classifiers. store_train_meta_features : bool (default: False) If True, the meta-features computed from the training data used for fitting the meta-classifier stored in the self.train_meta_features_ array, which can be accessed after calling fit . use_clones : bool (default: True) Clones the classifiers for stacking classification if True (default) or else uses the original ones, which will be refitted on the dataset upon calling the fit method. Hence, if use_clones=True, the original input classifiers will remain unmodified upon using the StackingClassifier's fit method. Setting use_clones=False is recommended if you are working with estimators that are supporting the scikit-learn fit/predict API interface but are not compatible to scikit-learn's clone function. Attributes clfs_ : list, shape=[n_classifiers] Fitted classifiers (clones of the original classifiers) meta_clf_ : estimator Fitted meta-classifier (clone of the original meta-estimator) train_meta_features : numpy array, shape = [n_samples, n_classifiers] meta-features for training data, where n_samples is the number of samples in training data and n_classifiers is the number of classfiers. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/classifier/StackingClassifier/ Methods fit(X, y, sample_weight=None) Fit ensemble classifers and the meta-classifier. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] or [n_samples, n_outputs] Target values. sample_weight : array-like, shape = [n_samples], optional Sample weights passed as sample_weights to each regressor in the regressors list as well as the meta_regressor. Raises error if some regressor does not support sample_weight in the fit() method. Returns self : object fit_transform(X, y=None, fit_params) Fit to data, then transform it. Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X. Parameters X : numpy array of shape [n_samples, n_features] Training set. y : numpy array of shape [n_samples] Target values. Returns X_new : numpy array of shape [n_samples, n_features_new] Transformed array. get_params(deep=True) Return estimator parameter names for GridSearch support. predict(X) Predict target values for X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns labels : array-like, shape = [n_samples] or [n_samples, n_outputs] Predicted class labels. predict_meta_features(X) Get meta-features of test-data. Parameters X : numpy array, shape = [n_samples, n_features] Test vectors, where n_samples is the number of samples and n_features is the number of features. Returns meta-features : numpy array, shape = [n_samples, n_classifiers] Returns the meta-features for test data. predict_proba(X) Predict class probabilities for X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns proba : array-like, shape = [n_samples, n_classes] or a list of n_outputs of such arrays if n_outputs > 1. Probability for each class per sample. score(X, y, sample_weight=None) Returns the mean accuracy on the given test data and labels. In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted. Parameters X : array-like, shape = (n_samples, n_features) Test samples. y : array-like, shape = (n_samples) or (n_samples, n_outputs) True labels for X. sample_weight : array-like, shape = [n_samples], optional Sample weights. Returns score : float Mean accuracy of self.predict(X) wrt. y. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self","title":"StackingClassifier"},{"location":"user_guide/classifier/StackingClassifier/#stackingclassifier","text":"An ensemble-learning meta-classifier for stacking. from mlxtend.classifier import StackingClassifier","title":"StackingClassifier"},{"location":"user_guide/classifier/StackingClassifier/#overview","text":"Stacking is an ensemble learning technique to combine multiple classification models via a meta-classifier. The individual classification models are trained based on the complete training set; then, the meta-classifier is fitted based on the outputs -- meta-features -- of the individual classification models in the ensemble. The meta-classifier can either be trained on the predicted class labels or probabilities from the ensemble. The algorithm can be summarized as follows (source: [1]):","title":"Overview"},{"location":"user_guide/classifier/StackingClassifier/#references","text":"[1] Tang, J., S. Alelyani, and H. Liu. \" Data Classification: Algorithms and Applications. \" Data Mining and Knowledge Discovery Series, CRC Press (2015): pp. 498-500. [2] Wolpert, David H. \" Stacked generalization. \" Neural networks 5.2 (1992): 241-259.","title":"References"},{"location":"user_guide/classifier/StackingClassifier/#example-1-simple-stacked-classification","text":"from sklearn import datasets iris = datasets.load_iris() X, y = iris.data[:, 1:3], iris.target from sklearn import model_selection from sklearn.linear_model import LogisticRegression from sklearn.neighbors import KNeighborsClassifier from sklearn.naive_bayes import GaussianNB from sklearn.ensemble import RandomForestClassifier from mlxtend.classifier import StackingClassifier import numpy as np clf1 = KNeighborsClassifier(n_neighbors=1) clf2 = RandomForestClassifier(random_state=1) clf3 = GaussianNB() lr = LogisticRegression() sclf = StackingClassifier(classifiers=[clf1, clf2, clf3], meta_classifier=lr) print('3-fold cross validation:\\n') for clf, label in zip([clf1, clf2, clf3, sclf], ['KNN', 'Random Forest', 'Naive Bayes', 'StackingClassifier']): scores = model_selection.cross_val_score(clf, X, y, cv=3, scoring='accuracy') print(\"Accuracy: %0.2f (+/- %0.2f) [%s]\" % (scores.mean(), scores.std(), label)) 3-fold cross validation: Accuracy: 0.91 (+/- 0.01) [KNN] Accuracy: 0.91 (+/- 0.06) [Random Forest] Accuracy: 0.92 (+/- 0.03) [Naive Bayes] Accuracy: 0.95 (+/- 0.03) [StackingClassifier] import matplotlib.pyplot as plt from mlxtend.plotting import plot_decision_regions import matplotlib.gridspec as gridspec import itertools gs = gridspec.GridSpec(2, 2) fig = plt.figure(figsize=(10,8)) for clf, lab, grd in zip([clf1, clf2, clf3, sclf], ['KNN', 'Random Forest', 'Naive Bayes', 'StackingClassifier'], itertools.product([0, 1], repeat=2)): clf.fit(X, y) ax = plt.subplot(gs[grd[0], grd[1]]) fig = plot_decision_regions(X=X, y=y, clf=clf) plt.title(lab)","title":"Example 1 - Simple Stacked Classification"},{"location":"user_guide/classifier/StackingClassifier/#example-2-using-probabilities-as-meta-features","text":"Alternatively, the class-probabilities of the first-level classifiers can be used to train the meta-classifier (2nd-level classifier) by setting use_probas=True . If average_probas=True , the probabilities of the level-1 classifiers are averaged, if average_probas=False , the probabilities are stacked (recommended). For example, in a 3-class setting with 2 level-1 classifiers, these classifiers may make the following \"probability\" predictions for 1 training sample: classifier 1: [0.2, 0.5, 0.3] classifier 2: [0.3, 0.4, 0.4] If average_probas=True , the meta-features would be: [0.25, 0.45, 0.35] In contrast, using average_probas=False results in k features where, k = [n_classes * n_classifiers], by stacking these level-1 probabilities: [0.2, 0.5, 0.3, 0.3, 0.4, 0.4] clf1 = KNeighborsClassifier(n_neighbors=1) clf2 = RandomForestClassifier(random_state=1) clf3 = GaussianNB() lr = LogisticRegression() sclf = StackingClassifier(classifiers=[clf1, clf2, clf3], use_probas=True, average_probas=False, meta_classifier=lr) print('3-fold cross validation:\\n') for clf, label in zip([clf1, clf2, clf3, sclf], ['KNN', 'Random Forest', 'Naive Bayes', 'StackingClassifier']): scores = model_selection.cross_val_score(clf, X, y, cv=3, scoring='accuracy') print(\"Accuracy: %0.2f (+/- %0.2f) [%s]\" % (scores.mean(), scores.std(), label)) 3-fold cross validation: Accuracy: 0.91 (+/- 0.01) [KNN] Accuracy: 0.91 (+/- 0.06) [Random Forest] Accuracy: 0.92 (+/- 0.03) [Naive Bayes] Accuracy: 0.94 (+/- 0.03) [StackingClassifier]","title":"Example 2 - Using Probabilities as Meta-Features"},{"location":"user_guide/classifier/StackingClassifier/#example-3-stacked-classification-and-gridsearch","text":"To set up a parameter grid for scikit-learn's GridSearch , we simply provide the estimator's names in the parameter grid -- in the special case of the meta-regressor, we append the 'meta-' prefix. from sklearn.linear_model import LogisticRegression from sklearn.neighbors import KNeighborsClassifier from sklearn.naive_bayes import GaussianNB from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import GridSearchCV from mlxtend.classifier import StackingClassifier # Initializing models clf1 = KNeighborsClassifier(n_neighbors=1) clf2 = RandomForestClassifier(random_state=1) clf3 = GaussianNB() lr = LogisticRegression() sclf = StackingClassifier(classifiers=[clf1, clf2, clf3], meta_classifier=lr) params = {'kneighborsclassifier__n_neighbors': [1, 5], 'randomforestclassifier__n_estimators': [10, 50], 'meta-logisticregression__C': [0.1, 10.0]} grid = GridSearchCV(estimator=sclf, param_grid=params, cv=5, refit=True) grid.fit(X, y) cv_keys = ('mean_test_score', 'std_test_score', 'params') for r, _ in enumerate(grid.cv_results_['mean_test_score']): print(\"%0.3f +/- %0.2f %r\" % (grid.cv_results_[cv_keys[0]][r], grid.cv_results_[cv_keys[1]][r] / 2.0, grid.cv_results_[cv_keys[2]][r])) print('Best parameters: %s' % grid.best_params_) print('Accuracy: %.2f' % grid.best_score_) 0.667 +/- 0.00 {'kneighborsclassifier__n_neighbors': 1, 'meta-logisticregression__C': 0.1, 'randomforestclassifier__n_estimators': 10} 0.667 +/- 0.00 {'kneighborsclassifier__n_neighbors': 1, 'meta-logisticregression__C': 0.1, 'randomforestclassifier__n_estimators': 50} 0.927 +/- 0.02 {'kneighborsclassifier__n_neighbors': 1, 'meta-logisticregression__C': 10.0, 'randomforestclassifier__n_estimators': 10} 0.913 +/- 0.03 {'kneighborsclassifier__n_neighbors': 1, 'meta-logisticregression__C': 10.0, 'randomforestclassifier__n_estimators': 50} 0.667 +/- 0.00 {'kneighborsclassifier__n_neighbors': 5, 'meta-logisticregression__C': 0.1, 'randomforestclassifier__n_estimators': 10} 0.667 +/- 0.00 {'kneighborsclassifier__n_neighbors': 5, 'meta-logisticregression__C': 0.1, 'randomforestclassifier__n_estimators': 50} 0.933 +/- 0.02 {'kneighborsclassifier__n_neighbors': 5, 'meta-logisticregression__C': 10.0, 'randomforestclassifier__n_estimators': 10} 0.940 +/- 0.02 {'kneighborsclassifier__n_neighbors': 5, 'meta-logisticregression__C': 10.0, 'randomforestclassifier__n_estimators': 50} Best parameters: {'kneighborsclassifier__n_neighbors': 5, 'meta-logisticregression__C': 10.0, 'randomforestclassifier__n_estimators': 50} Accuracy: 0.94 In case we are planning to use a regression algorithm multiple times, all we need to do is to add an additional number suffix in the parameter grid as shown below: from sklearn.model_selection import GridSearchCV # Initializing models clf1 = KNeighborsClassifier(n_neighbors=1) clf2 = RandomForestClassifier(random_state=1) clf3 = GaussianNB() lr = LogisticRegression() sclf = StackingClassifier(classifiers=[clf1, clf1, clf2, clf3], meta_classifier=lr) params = {'kneighborsclassifier-1__n_neighbors': [1, 5], 'kneighborsclassifier-2__n_neighbors': [1, 5], 'randomforestclassifier__n_estimators': [10, 50], 'meta-logisticregression__C': [0.1, 10.0]} grid = GridSearchCV(estimator=sclf, param_grid=params, cv=5, refit=True) grid.fit(X, y) cv_keys = ('mean_test_score', 'std_test_score', 'params') for r, _ in enumerate(grid.cv_results_['mean_test_score']): print(\"%0.3f +/- %0.2f %r\" % (grid.cv_results_[cv_keys[0]][r], grid.cv_results_[cv_keys[1]][r] / 2.0, grid.cv_results_[cv_keys[2]][r])) print('Best parameters: %s' % grid.best_params_) print('Accuracy: %.2f' % grid.best_score_) 0.667 +/- 0.00 {'kneighborsclassifier-1__n_neighbors': 1, 'kneighborsclassifier-2__n_neighbors': 1, 'meta-logisticregression__C': 0.1, 'randomforestclassifier__n_estimators': 10} 0.667 +/- 0.00 {'kneighborsclassifier-1__n_neighbors': 1, 'kneighborsclassifier-2__n_neighbors': 1, 'meta-logisticregression__C': 0.1, 'randomforestclassifier__n_estimators': 50} 0.907 +/- 0.03 {'kneighborsclassifier-1__n_neighbors': 1, 'kneighborsclassifier-2__n_neighbors': 1, 'meta-logisticregression__C': 10.0, 'randomforestclassifier__n_estimators': 10} 0.913 +/- 0.03 {'kneighborsclassifier-1__n_neighbors': 1, 'kneighborsclassifier-2__n_neighbors': 1, 'meta-logisticregression__C': 10.0, 'randomforestclassifier__n_estimators': 50} 0.667 +/- 0.00 {'kneighborsclassifier-1__n_neighbors': 1, 'kneighborsclassifier-2__n_neighbors': 5, 'meta-logisticregression__C': 0.1, 'randomforestclassifier__n_estimators': 10} 0.667 +/- 0.00 {'kneighborsclassifier-1__n_neighbors': 1, 'kneighborsclassifier-2__n_neighbors': 5, 'meta-logisticregression__C': 0.1, 'randomforestclassifier__n_estimators': 50} 0.927 +/- 0.02 {'kneighborsclassifier-1__n_neighbors': 1, 'kneighborsclassifier-2__n_neighbors': 5, 'meta-logisticregression__C': 10.0, 'randomforestclassifier__n_estimators': 10} 0.913 +/- 0.03 {'kneighborsclassifier-1__n_neighbors': 1, 'kneighborsclassifier-2__n_neighbors': 5, 'meta-logisticregression__C': 10.0, 'randomforestclassifier__n_estimators': 50} 0.667 +/- 0.00 {'kneighborsclassifier-1__n_neighbors': 5, 'kneighborsclassifier-2__n_neighbors': 1, 'meta-logisticregression__C': 0.1, 'randomforestclassifier__n_estimators': 10} 0.667 +/- 0.00 {'kneighborsclassifier-1__n_neighbors': 5, 'kneighborsclassifier-2__n_neighbors': 1, 'meta-logisticregression__C': 0.1, 'randomforestclassifier__n_estimators': 50} 0.927 +/- 0.02 {'kneighborsclassifier-1__n_neighbors': 5, 'kneighborsclassifier-2__n_neighbors': 1, 'meta-logisticregression__C': 10.0, 'randomforestclassifier__n_estimators': 10} 0.913 +/- 0.03 {'kneighborsclassifier-1__n_neighbors': 5, 'kneighborsclassifier-2__n_neighbors': 1, 'meta-logisticregression__C': 10.0, 'randomforestclassifier__n_estimators': 50} 0.667 +/- 0.00 {'kneighborsclassifier-1__n_neighbors': 5, 'kneighborsclassifier-2__n_neighbors': 5, 'meta-logisticregression__C': 0.1, 'randomforestclassifier__n_estimators': 10} 0.667 +/- 0.00 {'kneighborsclassifier-1__n_neighbors': 5, 'kneighborsclassifier-2__n_neighbors': 5, 'meta-logisticregression__C': 0.1, 'randomforestclassifier__n_estimators': 50} 0.933 +/- 0.02 {'kneighborsclassifier-1__n_neighbors': 5, 'kneighborsclassifier-2__n_neighbors': 5, 'meta-logisticregression__C': 10.0, 'randomforestclassifier__n_estimators': 10} 0.940 +/- 0.02 {'kneighborsclassifier-1__n_neighbors': 5, 'kneighborsclassifier-2__n_neighbors': 5, 'meta-logisticregression__C': 10.0, 'randomforestclassifier__n_estimators': 50} Best parameters: {'kneighborsclassifier-1__n_neighbors': 5, 'kneighborsclassifier-2__n_neighbors': 5, 'meta-logisticregression__C': 10.0, 'randomforestclassifier__n_estimators': 50} Accuracy: 0.94 Note The StackingClassifier also enables grid search over the classifiers argument. However, due to the current implementation of GridSearchCV in scikit-learn, it is not possible to search over both, differenct classifiers and classifier parameters at the same time. For instance, while the following parameter dictionary works params = {'randomforestclassifier__n_estimators': [1, 100], 'classifiers': [(clf1, clf1, clf1), (clf2, clf3)]} it will use the instance settings of clf1 , clf2 , and clf3 and not overwrite it with the 'n_estimators' settings from 'randomforestclassifier__n_estimators': [1, 100] .","title":"Example 3 - Stacked Classification and GridSearch"},{"location":"user_guide/classifier/StackingClassifier/#example-4-stacking-of-classifiers-that-operate-on-different-feature-subsets","text":"The different level-1 classifiers can be fit to different subsets of features in the training dataset. The following example illustrates how this can be done on a technical level using scikit-learn pipelines and the ColumnSelector : from sklearn.datasets import load_iris from mlxtend.classifier import StackingClassifier from mlxtend.feature_selection import ColumnSelector from sklearn.pipeline import make_pipeline from sklearn.linear_model import LogisticRegression iris = load_iris() X = iris.data y = iris.target pipe1 = make_pipeline(ColumnSelector(cols=(0, 2)), LogisticRegression()) pipe2 = make_pipeline(ColumnSelector(cols=(1, 2, 3)), LogisticRegression()) sclf = StackingClassifier(classifiers=[pipe1, pipe2], meta_classifier=LogisticRegression()) sclf.fit(X, y) StackingClassifier(average_probas=False, classifiers=[Pipeline(steps=[('columnselector', ColumnSelector(cols=(0, 2))), ('logisticregression', LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True, intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1, penalty='l2', random_state=None, solve...='l2', random_state=None, solver='liblinear', tol=0.0001, verbose=0, warm_start=False))])], meta_classifier=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True, intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1, penalty='l2', random_state=None, solver='liblinear', tol=0.0001, verbose=0, warm_start=False), use_features_in_secondary=False, use_probas=False, verbose=0)","title":"Example 4 - Stacking of Classifiers that Operate on Different Feature Subsets"},{"location":"user_guide/classifier/StackingClassifier/#api","text":"StackingClassifier(classifiers, meta_classifier, use_probas=False, average_probas=False, verbose=0, use_features_in_secondary=False, store_train_meta_features=False, use_clones=True) A Stacking classifier for scikit-learn estimators for classification. Parameters classifiers : array-like, shape = [n_classifiers] A list of classifiers. Invoking the fit method on the StackingClassifer will fit clones of these original classifiers that will be stored in the class attribute self.clfs_ . meta_classifier : object The meta-classifier to be fitted on the ensemble of classifiers use_probas : bool (default: False) If True, trains meta-classifier based on predicted probabilities instead of class labels. average_probas : bool (default: False) Averages the probabilities as meta features if True. verbose : int, optional (default=0) Controls the verbosity of the building process. - verbose=0 (default): Prints nothing - verbose=1 : Prints the number & name of the regressor being fitted - verbose=2 : Prints info about the parameters of the regressor being fitted - verbose>2 : Changes verbose param of the underlying regressor to self.verbose - 2 use_features_in_secondary : bool (default: False) If True, the meta-classifier will be trained both on the predictions of the original classifiers and the original dataset. If False, the meta-classifier will be trained only on the predictions of the original classifiers. store_train_meta_features : bool (default: False) If True, the meta-features computed from the training data used for fitting the meta-classifier stored in the self.train_meta_features_ array, which can be accessed after calling fit . use_clones : bool (default: True) Clones the classifiers for stacking classification if True (default) or else uses the original ones, which will be refitted on the dataset upon calling the fit method. Hence, if use_clones=True, the original input classifiers will remain unmodified upon using the StackingClassifier's fit method. Setting use_clones=False is recommended if you are working with estimators that are supporting the scikit-learn fit/predict API interface but are not compatible to scikit-learn's clone function. Attributes clfs_ : list, shape=[n_classifiers] Fitted classifiers (clones of the original classifiers) meta_clf_ : estimator Fitted meta-classifier (clone of the original meta-estimator) train_meta_features : numpy array, shape = [n_samples, n_classifiers] meta-features for training data, where n_samples is the number of samples in training data and n_classifiers is the number of classfiers. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/classifier/StackingClassifier/","title":"API"},{"location":"user_guide/classifier/StackingClassifier/#methods","text":"fit(X, y, sample_weight=None) Fit ensemble classifers and the meta-classifier. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] or [n_samples, n_outputs] Target values. sample_weight : array-like, shape = [n_samples], optional Sample weights passed as sample_weights to each regressor in the regressors list as well as the meta_regressor. Raises error if some regressor does not support sample_weight in the fit() method. Returns self : object fit_transform(X, y=None, fit_params) Fit to data, then transform it. Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X. Parameters X : numpy array of shape [n_samples, n_features] Training set. y : numpy array of shape [n_samples] Target values. Returns X_new : numpy array of shape [n_samples, n_features_new] Transformed array. get_params(deep=True) Return estimator parameter names for GridSearch support. predict(X) Predict target values for X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns labels : array-like, shape = [n_samples] or [n_samples, n_outputs] Predicted class labels. predict_meta_features(X) Get meta-features of test-data. Parameters X : numpy array, shape = [n_samples, n_features] Test vectors, where n_samples is the number of samples and n_features is the number of features. Returns meta-features : numpy array, shape = [n_samples, n_classifiers] Returns the meta-features for test data. predict_proba(X) Predict class probabilities for X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns proba : array-like, shape = [n_samples, n_classes] or a list of n_outputs of such arrays if n_outputs > 1. Probability for each class per sample. score(X, y, sample_weight=None) Returns the mean accuracy on the given test data and labels. In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted. Parameters X : array-like, shape = (n_samples, n_features) Test samples. y : array-like, shape = (n_samples) or (n_samples, n_outputs) True labels for X. sample_weight : array-like, shape = [n_samples], optional Sample weights. Returns score : float Mean accuracy of self.predict(X) wrt. y. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self","title":"Methods"},{"location":"user_guide/cluster/Kmeans/","text":"Kmeans A implementation of k-means clustering. from mlxtend.cluster import Kmeans Overview Clustering falls into the category of unsupervised learning, a subfield of machine learning where the ground truth labels are not available to us in real-world applications. In clustering, our goal is to group samples by similarity (in k-means: Euclidean distance). The k-means algorithms can be summarized as follows: Randomly pick k centroids from the sample points as initial cluster centers. Assign each sample to the nearest centroid \\mu(j), \\; j \\in {1,...,k} . Move the centroids to the center of the samples that were assigned to it. Repeat steps 2 and 3 until the cluster assignments do not change or a user-defined tolerance or a maximum number of iterations is reached. References MacQueen, J. B. (1967). Some Methods for classification and Analysis of Multivariate Observations . Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability. University of California Press. pp. 281\u2013297. MR 0214227. Zbl 0214.46201. Retrieved 2009-04-07. Example 1 - Three Blobs Load some sample data: import matplotlib.pyplot as plt from mlxtend.data import three_blobs_data X, y = three_blobs_data() plt.scatter(X[:, 0], X[:, 1], c='white') plt.show() Compute the cluster centroids: from mlxtend.cluster import Kmeans km = Kmeans(k=3, max_iter=50, random_seed=1, print_progress=3) km.fit(X) print('Iterations until convergence:', km.iterations_) print('Final centroids:\\n', km.centroids_) Iteration: 2/50 | Elapsed: 00:00:00 | ETA: 00:00:00 Iterations until convergence: 2 Final centroids: [[-1.5947298 2.92236966] [ 2.06521743 0.96137409] [ 0.9329651 4.35420713]] Visualize the cluster memberships: y_clust = km.predict(X) plt.scatter(X[y_clust == 0, 0], X[y_clust == 0, 1], s=50, c='lightgreen', marker='s', label='cluster 1') plt.scatter(X[y_clust == 1,0], X[y_clust == 1,1], s=50, c='orange', marker='o', label='cluster 2') plt.scatter(X[y_clust == 2,0], X[y_clust == 2,1], s=50, c='lightblue', marker='v', label='cluster 3') plt.scatter(km.centroids_[:,0], km.centroids_[:,1], s=250, marker='*', c='red', label='centroids') plt.legend(loc='lower left', scatterpoints=1) plt.grid() plt.show() API Kmeans(k, max_iter=10, convergence_tolerance=1e-05, random_seed=None, print_progress=0) K-means clustering class. Added in 0.4.1dev Parameters k : int Number of clusters max_iter : int (default: 10) Number of iterations during cluster assignment. Cluster re-assignment stops automatically when the algorithm converged. convergence_tolerance : float (default: 1e-05) Compares current centroids with centroids of the previous iteration using the given tolerance (a small positive float)to determine if the algorithm converged early. random_seed : int (default: None) Set random state for the initial centroid assignment. print_progress : int (default: 0) Prints progress in fitting to stderr. 0: No output 1: Iterations elapsed 2: 1 plus time elapsed 3: 2 plus estimated time until completion Attributes centroids_ : 2d-array, shape={k, n_features} Feature values of the k cluster centroids. custers_ : dictionary The cluster assignments stored as a Python dictionary; the dictionary keys denote the cluster indeces and the items are Python lists of the sample indices that were assigned to each cluster. iterations_ : int Number of iterations until convergence. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/classifier/Kmeans/ Methods fit(X, init_params=True) Learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. init_params : bool (default: True) Re-initializes model parameters prior to fitting. Set False to continue training with weights from a previous model fitting. Returns self : object predict(X) Predict targets from X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns target_values : array-like, shape = [n_samples] Predicted target values.","title":"Kmeans"},{"location":"user_guide/cluster/Kmeans/#kmeans","text":"A implementation of k-means clustering. from mlxtend.cluster import Kmeans","title":"Kmeans"},{"location":"user_guide/cluster/Kmeans/#overview","text":"Clustering falls into the category of unsupervised learning, a subfield of machine learning where the ground truth labels are not available to us in real-world applications. In clustering, our goal is to group samples by similarity (in k-means: Euclidean distance). The k-means algorithms can be summarized as follows: Randomly pick k centroids from the sample points as initial cluster centers. Assign each sample to the nearest centroid \\mu(j), \\; j \\in {1,...,k} . Move the centroids to the center of the samples that were assigned to it. Repeat steps 2 and 3 until the cluster assignments do not change or a user-defined tolerance or a maximum number of iterations is reached.","title":"Overview"},{"location":"user_guide/cluster/Kmeans/#references","text":"MacQueen, J. B. (1967). Some Methods for classification and Analysis of Multivariate Observations . Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability. University of California Press. pp. 281\u2013297. MR 0214227. Zbl 0214.46201. Retrieved 2009-04-07.","title":"References"},{"location":"user_guide/cluster/Kmeans/#example-1-three-blobs","text":"","title":"Example 1 - Three Blobs"},{"location":"user_guide/cluster/Kmeans/#load-some-sample-data","text":"import matplotlib.pyplot as plt from mlxtend.data import three_blobs_data X, y = three_blobs_data() plt.scatter(X[:, 0], X[:, 1], c='white') plt.show()","title":"Load some sample data:"},{"location":"user_guide/cluster/Kmeans/#compute-the-cluster-centroids","text":"from mlxtend.cluster import Kmeans km = Kmeans(k=3, max_iter=50, random_seed=1, print_progress=3) km.fit(X) print('Iterations until convergence:', km.iterations_) print('Final centroids:\\n', km.centroids_) Iteration: 2/50 | Elapsed: 00:00:00 | ETA: 00:00:00 Iterations until convergence: 2 Final centroids: [[-1.5947298 2.92236966] [ 2.06521743 0.96137409] [ 0.9329651 4.35420713]]","title":"Compute the cluster centroids:"},{"location":"user_guide/cluster/Kmeans/#visualize-the-cluster-memberships","text":"y_clust = km.predict(X) plt.scatter(X[y_clust == 0, 0], X[y_clust == 0, 1], s=50, c='lightgreen', marker='s', label='cluster 1') plt.scatter(X[y_clust == 1,0], X[y_clust == 1,1], s=50, c='orange', marker='o', label='cluster 2') plt.scatter(X[y_clust == 2,0], X[y_clust == 2,1], s=50, c='lightblue', marker='v', label='cluster 3') plt.scatter(km.centroids_[:,0], km.centroids_[:,1], s=250, marker='*', c='red', label='centroids') plt.legend(loc='lower left', scatterpoints=1) plt.grid() plt.show()","title":"Visualize the cluster memberships:"},{"location":"user_guide/cluster/Kmeans/#api","text":"Kmeans(k, max_iter=10, convergence_tolerance=1e-05, random_seed=None, print_progress=0) K-means clustering class. Added in 0.4.1dev Parameters k : int Number of clusters max_iter : int (default: 10) Number of iterations during cluster assignment. Cluster re-assignment stops automatically when the algorithm converged. convergence_tolerance : float (default: 1e-05) Compares current centroids with centroids of the previous iteration using the given tolerance (a small positive float)to determine if the algorithm converged early. random_seed : int (default: None) Set random state for the initial centroid assignment. print_progress : int (default: 0) Prints progress in fitting to stderr. 0: No output 1: Iterations elapsed 2: 1 plus time elapsed 3: 2 plus estimated time until completion Attributes centroids_ : 2d-array, shape={k, n_features} Feature values of the k cluster centroids. custers_ : dictionary The cluster assignments stored as a Python dictionary; the dictionary keys denote the cluster indeces and the items are Python lists of the sample indices that were assigned to each cluster. iterations_ : int Number of iterations until convergence. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/classifier/Kmeans/","title":"API"},{"location":"user_guide/cluster/Kmeans/#methods","text":"fit(X, init_params=True) Learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. init_params : bool (default: True) Re-initializes model parameters prior to fitting. Set False to continue training with weights from a previous model fitting. Returns self : object predict(X) Predict targets from X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns target_values : array-like, shape = [n_samples] Predicted target values.","title":"Methods"},{"location":"user_guide/data/autompg_data/","text":"Auto MPG A function that loads the autompg dataset into NumPy arrays. from mlxtend.data import autompg_data Overview The Auto-MPG dataset for regression analysis. The target ( y ) is defined as the miles per gallon (mpg) for 392 automobiles (6 rows containing \"NaN\"s have been removed. The 8 feature columns are: Features cylinders: multi-valued discrete displacement: continuous horsepower: continuous weight: continuous acceleration: continuous model year: multi-valued discrete origin: multi-valued discrete car name: string (unique for each instance) Number of samples: 392 Target variable (continuous): mpg References Source: https://archive.ics.uci.edu/ml/datasets/Auto+MPG Quinlan,R. (1993). Combining Instance-Based and Model-Based Learning. In Proceedings on the Tenth International Conference of Machine Learning, 236-243, University of Massachusetts, Amherst. Morgan Kaufmann. Example - Dataset overview from mlxtend.data import autompg_data X, y = autompg_data() print('Dimensions: %s x %s' % (X.shape[0], X.shape[1])) print('\\nHeader: %s' % ['cylinders', 'displacement', 'horsepower', 'weight', 'acceleration', 'model year', 'origin', 'car name']) print('1st row', X[0]) Dimensions: 392 x 8 Header: ['cylinders', 'displacement', 'horsepower', 'weight', 'acceleration', 'model year', 'origin', 'car name'] 1st row [ 8.00000000e+00 3.07000000e+02 1.30000000e+02 3.50400000e+03 1.20000000e+01 7.00000000e+01 1.00000000e+00 nan] Note that the feature array contains a str column (\"car name\"), thus it is recommended to pick the features as needed and convert it into a float array for further analysis. The example below shows how to get rid of the car name column and cast the NumPy array as a float array. X[:, :-1].astype(float) array([[ 8. , 307. , 130. , ..., 12. , 70. , 1. ], [ 8. , 350. , 165. , ..., 11.5, 70. , 1. ], [ 8. , 318. , 150. , ..., 11. , 70. , 1. ], ..., [ 4. , 135. , 84. , ..., 11.6, 82. , 1. ], [ 4. , 120. , 79. , ..., 18.6, 82. , 1. ], [ 4. , 119. , 82. , ..., 19.4, 82. , 1. ]]) API autompg_data() Auto MPG dataset. Source : https://archive.ics.uci.edu/ml/datasets/Auto+MPG Number of samples : 392 Continuous target variable : mpg Dataset Attributes: 1) cylinders: multi-valued discrete 2) displacement: continuous 3) horsepower: continuous 4) weight: continuous 5) acceleration: continuous 6) model year: multi-valued discrete 7) origin: multi-valued discrete 8) car name: string (unique for each instance) Returns X, y : [n_samples, n_features], [n_targets] X is the feature matrix with 392 auto samples as rows and 8 feature columns (6 rows with NaNs removed). y is a 1-dimensional array of the target MPG values. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/data/autompg_data/","title":"Auto MPG"},{"location":"user_guide/data/autompg_data/#auto-mpg","text":"A function that loads the autompg dataset into NumPy arrays. from mlxtend.data import autompg_data","title":"Auto MPG"},{"location":"user_guide/data/autompg_data/#overview","text":"The Auto-MPG dataset for regression analysis. The target ( y ) is defined as the miles per gallon (mpg) for 392 automobiles (6 rows containing \"NaN\"s have been removed. The 8 feature columns are: Features cylinders: multi-valued discrete displacement: continuous horsepower: continuous weight: continuous acceleration: continuous model year: multi-valued discrete origin: multi-valued discrete car name: string (unique for each instance) Number of samples: 392 Target variable (continuous): mpg","title":"Overview"},{"location":"user_guide/data/autompg_data/#references","text":"Source: https://archive.ics.uci.edu/ml/datasets/Auto+MPG Quinlan,R. (1993). Combining Instance-Based and Model-Based Learning. In Proceedings on the Tenth International Conference of Machine Learning, 236-243, University of Massachusetts, Amherst. Morgan Kaufmann.","title":"References"},{"location":"user_guide/data/autompg_data/#example-dataset-overview","text":"from mlxtend.data import autompg_data X, y = autompg_data() print('Dimensions: %s x %s' % (X.shape[0], X.shape[1])) print('\\nHeader: %s' % ['cylinders', 'displacement', 'horsepower', 'weight', 'acceleration', 'model year', 'origin', 'car name']) print('1st row', X[0]) Dimensions: 392 x 8 Header: ['cylinders', 'displacement', 'horsepower', 'weight', 'acceleration', 'model year', 'origin', 'car name'] 1st row [ 8.00000000e+00 3.07000000e+02 1.30000000e+02 3.50400000e+03 1.20000000e+01 7.00000000e+01 1.00000000e+00 nan] Note that the feature array contains a str column (\"car name\"), thus it is recommended to pick the features as needed and convert it into a float array for further analysis. The example below shows how to get rid of the car name column and cast the NumPy array as a float array. X[:, :-1].astype(float) array([[ 8. , 307. , 130. , ..., 12. , 70. , 1. ], [ 8. , 350. , 165. , ..., 11.5, 70. , 1. ], [ 8. , 318. , 150. , ..., 11. , 70. , 1. ], ..., [ 4. , 135. , 84. , ..., 11.6, 82. , 1. ], [ 4. , 120. , 79. , ..., 18.6, 82. , 1. ], [ 4. , 119. , 82. , ..., 19.4, 82. , 1. ]])","title":"Example - Dataset overview"},{"location":"user_guide/data/autompg_data/#api","text":"autompg_data() Auto MPG dataset. Source : https://archive.ics.uci.edu/ml/datasets/Auto+MPG Number of samples : 392 Continuous target variable : mpg Dataset Attributes: 1) cylinders: multi-valued discrete 2) displacement: continuous 3) horsepower: continuous 4) weight: continuous 5) acceleration: continuous 6) model year: multi-valued discrete 7) origin: multi-valued discrete 8) car name: string (unique for each instance) Returns X, y : [n_samples, n_features], [n_targets] X is the feature matrix with 392 auto samples as rows and 8 feature columns (6 rows with NaNs removed). y is a 1-dimensional array of the target MPG values. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/data/autompg_data/","title":"API"},{"location":"user_guide/data/boston_housing_data/","text":"Boston Housing Data A function that loads the boston_housing_data dataset into NumPy arrays. from mlxtend.data import boston_housing_data Overview The Boston Housing dataset for regression analysis. Features CRIM: per capita crime rate by town ZN: proportion of residential land zoned for lots over 25,000 sq.ft. INDUS: proportion of non-retail business acres per town CHAS: Charles River dummy variable (= 1 if tract bounds river; 0 otherwise) NOX: nitric oxides concentration (parts per 10 million) RM: average number of rooms per dwelling AGE: proportion of owner-occupied units built prior to 1940 DIS: weighted distances to five Boston employment centres RAD: index of accessibility to radial highways TAX: full-value property-tax rate per $10,000 PTRATIO: pupil-teacher ratio by town B: 1000(Bk - 0.63)^2 where Bk is the proportion of b. by town LSTAT: % lower status of the population Number of samples: 506 Target variable (continuous): MEDV, Median value of owner-occupied homes in $1000's References Source: https://archive.ics.uci.edu/ml/datasets/Wine Harrison, D. and Rubinfeld, D.L. 'Hedonic prices and the demand for clean air', J. Environ. Economics & Management, vol.5, 81-102, 1978. Example 1 - Dataset overview from mlxtend.data import boston_housing_data X, y = boston_housing_data() print('Dimensions: %s x %s' % (X.shape[0], X.shape[1])) print('1st row', X[0]) (506, 14) Dimensions: 506 x 13 1st row [ 6.32000000e-03 1.80000000e+01 2.31000000e+00 0.00000000e+00 5.38000000e-01 6.57500000e+00 6.52000000e+01 4.09000000e+00 1.00000000e+00 2.96000000e+02 1.53000000e+01 3.96900000e+02 4.98000000e+00] API boston_housing_data() Boston Housing dataset. Source : https://archive.ics.uci.edu/ml/datasets/Housing Number of samples : 506 Continuous target variable : MEDV MEDV = Median value of owner-occupied homes in $1000's Dataset Attributes: 1) CRIM per capita crime rate by town 2) ZN proportion of residential land zoned for lots over 25,000 sq.ft. 3) INDUS proportion of non-retail business acres per town 4) CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise) 5) NOX nitric oxides concentration (parts per 10 million) 6) RM average number of rooms per dwelling 7) AGE proportion of owner-occupied units built prior to 1940 8) DIS weighted distances to five Boston employment centres 9) RAD index of accessibility to radial highways 10) TAX full-value property-tax rate per $10,000 11) PTRATIO pupil-teacher ratio by town 12) B 1000(Bk - 0.63)^2 where Bk is the prop. of b. by town 13) LSTAT % lower status of the population Returns X, y : [n_samples, n_features], [n_class_labels] X is the feature matrix with 506 housing samples as rows and 13 feature columns. y is a 1-dimensional array of the continuous target variable MEDV Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/data/boston_housing_data/","title":"Boston Housing Data"},{"location":"user_guide/data/boston_housing_data/#boston-housing-data","text":"A function that loads the boston_housing_data dataset into NumPy arrays. from mlxtend.data import boston_housing_data","title":"Boston Housing Data"},{"location":"user_guide/data/boston_housing_data/#overview","text":"The Boston Housing dataset for regression analysis. Features CRIM: per capita crime rate by town ZN: proportion of residential land zoned for lots over 25,000 sq.ft. INDUS: proportion of non-retail business acres per town CHAS: Charles River dummy variable (= 1 if tract bounds river; 0 otherwise) NOX: nitric oxides concentration (parts per 10 million) RM: average number of rooms per dwelling AGE: proportion of owner-occupied units built prior to 1940 DIS: weighted distances to five Boston employment centres RAD: index of accessibility to radial highways TAX: full-value property-tax rate per $10,000 PTRATIO: pupil-teacher ratio by town B: 1000(Bk - 0.63)^2 where Bk is the proportion of b. by town LSTAT: % lower status of the population Number of samples: 506 Target variable (continuous): MEDV, Median value of owner-occupied homes in $1000's","title":"Overview"},{"location":"user_guide/data/boston_housing_data/#references","text":"Source: https://archive.ics.uci.edu/ml/datasets/Wine Harrison, D. and Rubinfeld, D.L. 'Hedonic prices and the demand for clean air', J. Environ. Economics & Management, vol.5, 81-102, 1978.","title":"References"},{"location":"user_guide/data/boston_housing_data/#example-1-dataset-overview","text":"from mlxtend.data import boston_housing_data X, y = boston_housing_data() print('Dimensions: %s x %s' % (X.shape[0], X.shape[1])) print('1st row', X[0]) (506, 14) Dimensions: 506 x 13 1st row [ 6.32000000e-03 1.80000000e+01 2.31000000e+00 0.00000000e+00 5.38000000e-01 6.57500000e+00 6.52000000e+01 4.09000000e+00 1.00000000e+00 2.96000000e+02 1.53000000e+01 3.96900000e+02 4.98000000e+00]","title":"Example 1 - Dataset overview"},{"location":"user_guide/data/boston_housing_data/#api","text":"boston_housing_data() Boston Housing dataset. Source : https://archive.ics.uci.edu/ml/datasets/Housing Number of samples : 506 Continuous target variable : MEDV MEDV = Median value of owner-occupied homes in $1000's Dataset Attributes: 1) CRIM per capita crime rate by town 2) ZN proportion of residential land zoned for lots over 25,000 sq.ft. 3) INDUS proportion of non-retail business acres per town 4) CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise) 5) NOX nitric oxides concentration (parts per 10 million) 6) RM average number of rooms per dwelling 7) AGE proportion of owner-occupied units built prior to 1940 8) DIS weighted distances to five Boston employment centres 9) RAD index of accessibility to radial highways 10) TAX full-value property-tax rate per $10,000 11) PTRATIO pupil-teacher ratio by town 12) B 1000(Bk - 0.63)^2 where Bk is the prop. of b. by town 13) LSTAT % lower status of the population Returns X, y : [n_samples, n_features], [n_class_labels] X is the feature matrix with 506 housing samples as rows and 13 feature columns. y is a 1-dimensional array of the continuous target variable MEDV Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/data/boston_housing_data/","title":"API"},{"location":"user_guide/data/iris_data/","text":"Iris Dataset A function that loads the iris dataset into NumPy arrays. from mlxtend.data import iris_data Overview The Iris dataset for classification. Features Sepal length Sepal width Petal length Petal width Number of samples: 150 Target variable (discrete): {50x Setosa, 50x Versicolor, 50x Virginica} References Source: https://archive.ics.uci.edu/ml/datasets/Iris Bache, K. & Lichman, M. (2013). UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science. Example 1 - Dataset overview from mlxtend.data import iris_data X, y = iris_data() print('Dimensions: %s x %s' % (X.shape[0], X.shape[1])) print('\\nHeader: %s' % ['sepal length', 'sepal width', 'petal length', 'petal width']) print('1st row', X[0]) Dimensions: 150 x 4 Header: ['sepal length', 'sepal width', 'petal length', 'petal width'] 1st row [ 5.1 3.5 1.4 0.2] import numpy as np print('Classes: Setosa, Versicolor, Virginica') print(np.unique(y)) print('Class distribution: %s' % np.bincount(y)) Classes: Setosa, Versicolor, Virginica [0 1 2] Class distribution: [50 50 50] API iris_data() Iris flower dataset. Source : https://archive.ics.uci.edu/ml/datasets/Iris Number of samples : 150 Class labels : {0, 1, 2}, distribution: [50, 50, 50] 0 = setosa, 1 = versicolor, 2 = virginica. Dataset Attributes: 1) sepal length [cm] 2) sepal width [cm] 3) petal length [cm] 4) petal width [cm] Returns X, y : [n_samples, n_features], [n_class_labels] X is the feature matrix with 150 flower samples as rows, and 4 feature columns sepal length, sepal width, petal length, and petal width. y is a 1-dimensional array of the class labels {0, 1, 2} Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/data/iris_data/","title":"Iris Dataset"},{"location":"user_guide/data/iris_data/#iris-dataset","text":"A function that loads the iris dataset into NumPy arrays. from mlxtend.data import iris_data","title":"Iris Dataset"},{"location":"user_guide/data/iris_data/#overview","text":"The Iris dataset for classification. Features Sepal length Sepal width Petal length Petal width Number of samples: 150 Target variable (discrete): {50x Setosa, 50x Versicolor, 50x Virginica}","title":"Overview"},{"location":"user_guide/data/iris_data/#references","text":"Source: https://archive.ics.uci.edu/ml/datasets/Iris Bache, K. & Lichman, M. (2013). UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science.","title":"References"},{"location":"user_guide/data/iris_data/#example-1-dataset-overview","text":"from mlxtend.data import iris_data X, y = iris_data() print('Dimensions: %s x %s' % (X.shape[0], X.shape[1])) print('\\nHeader: %s' % ['sepal length', 'sepal width', 'petal length', 'petal width']) print('1st row', X[0]) Dimensions: 150 x 4 Header: ['sepal length', 'sepal width', 'petal length', 'petal width'] 1st row [ 5.1 3.5 1.4 0.2] import numpy as np print('Classes: Setosa, Versicolor, Virginica') print(np.unique(y)) print('Class distribution: %s' % np.bincount(y)) Classes: Setosa, Versicolor, Virginica [0 1 2] Class distribution: [50 50 50]","title":"Example 1 - Dataset overview"},{"location":"user_guide/data/iris_data/#api","text":"iris_data() Iris flower dataset. Source : https://archive.ics.uci.edu/ml/datasets/Iris Number of samples : 150 Class labels : {0, 1, 2}, distribution: [50, 50, 50] 0 = setosa, 1 = versicolor, 2 = virginica. Dataset Attributes: 1) sepal length [cm] 2) sepal width [cm] 3) petal length [cm] 4) petal width [cm] Returns X, y : [n_samples, n_features], [n_class_labels] X is the feature matrix with 150 flower samples as rows, and 4 feature columns sepal length, sepal width, petal length, and petal width. y is a 1-dimensional array of the class labels {0, 1, 2} Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/data/iris_data/","title":"API"},{"location":"user_guide/data/loadlocal_mnist/","text":"Load the MNIST Dataset from Local Files A utility function that loads the MNIST dataset from byte-form into NumPy arrays. from mlxtend.data import loadlocal_mnist Overview The MNIST dataset was constructed from two datasets of the US National Institute of Standards and Technology (NIST). The training set consists of handwritten digits from 250 different people, 50 percent high school students, and 50 percent employees from the Census Bureau. Note that the test set contains handwritten digits from different people following the same split. The MNIST dataset is publicly available at http://yann.lecun.com/exdb/mnist/ and consists of the following four parts: - Training set images: train-images-idx3-ubyte.gz (9.9 MB, 47 MB unzipped, and 60,000 samples) - Training set labels: train-labels-idx1-ubyte.gz (29 KB, 60 KB unzipped, and 60,000 labels) - Test set images: t10k-images-idx3-ubyte.gz (1.6 MB, 7.8 MB, unzipped and 10,000 samples) - Test set labels: t10k-labels-idx1-ubyte.gz (5 KB, 10 KB unzipped, and 10,000 labels) Features Each feature vector (row in the feature matrix) consists of 784 pixels (intensities) -- unrolled from the original 28x28 pixels images. Number of samples: 50000 images Target variable (discrete): {50x Setosa, 50x Versicolor, 50x Virginica} References Source: http://yann.lecun.com/exdb/mnist/ Y. LeCun and C. Cortes. Mnist handwritten digit database. AT&T Labs [Online]. Available: http://yann. lecun. com/exdb/mnist, 2010. Example 1 Part 1 - Downloading the MNIST dataset 1) Download the MNIST files from Y. LeCun's website http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz for example, via curl -O http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz 2) Unzip the downloaded gzip archives for example, via gunzip t*-ubyte.gz Example 1 Part 2 - Loading MNIST into NumPy Arrays from mlxtend.data import loadlocal_mnist X, y = loadlocal_mnist( images_path='/Users/Sebastian/Desktop/train-images-idx3-ubyte', labels_path='/Users/Sebastian/Desktop/train-labels-idx1-ubyte') print('Dimensions: %s x %s' % (X.shape[0], X.shape[1])) print('\\n1st row', X[0]) Dimensions: 60000 x 784 1st row [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 18 18 18 126 136 175 26 166 255 247 127 0 0 0 0 0 0 0 0 0 0 0 0 30 36 94 154 170 253 253 253 253 253 225 172 253 242 195 64 0 0 0 0 0 0 0 0 0 0 0 49 238 253 253 253 253 253 253 253 253 251 93 82 82 56 39 0 0 0 0 0 0 0 0 0 0 0 0 18 219 253 253 253 253 253 198 182 247 241 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 80 156 107 253 253 205 11 0 43 154 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 14 1 154 253 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 139 253 190 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11 190 253 70 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 35 241 225 160 108 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 81 240 253 253 119 25 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 45 186 253 253 150 27 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 16 93 252 253 187 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 249 253 249 64 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 46 130 183 253 253 207 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 39 148 229 253 253 253 250 182 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 24 114 221 253 253 253 253 201 78 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 23 66 213 253 253 253 253 198 81 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 18 171 219 253 253 253 253 195 80 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 55 172 226 253 253 253 253 244 133 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 136 253 253 253 212 135 132 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] import numpy as np print('Digits: 0 1 2 3 4 5 6 7 8 9') print('labels: %s' % np.unique(y)) print('Class distribution: %s' % np.bincount(y)) Digits: 0 1 2 3 4 5 6 7 8 9 labels: [0 1 2 3 4 5 6 7 8 9] Class distribution: [5923 6742 5958 6131 5842 5421 5918 6265 5851 5949] Store as CSV Files np.savetxt(fname='/Users/Sebastian/Desktop/images.csv', X=X, delimiter=',', fmt='%d') np.savetxt(fname='/Users/Sebastian/Desktop/labels.csv', X=y, delimiter=',', fmt='%d') API loadlocal_mnist(images_path, labels_path) Read MNIST from ubyte files. Parameters images_path : str path to the test or train MNIST ubyte file labels_path : str path to the test or train MNIST class labels file Returns images : [n_samples, n_pixels] numpy.array Pixel values of the images. labels : [n_samples] numpy array Target class labels Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/data/loadlocal_mnist/","title":"Load the MNIST Dataset from Local Files"},{"location":"user_guide/data/loadlocal_mnist/#load-the-mnist-dataset-from-local-files","text":"A utility function that loads the MNIST dataset from byte-form into NumPy arrays. from mlxtend.data import loadlocal_mnist","title":"Load the MNIST Dataset from Local Files"},{"location":"user_guide/data/loadlocal_mnist/#overview","text":"The MNIST dataset was constructed from two datasets of the US National Institute of Standards and Technology (NIST). The training set consists of handwritten digits from 250 different people, 50 percent high school students, and 50 percent employees from the Census Bureau. Note that the test set contains handwritten digits from different people following the same split. The MNIST dataset is publicly available at http://yann.lecun.com/exdb/mnist/ and consists of the following four parts: - Training set images: train-images-idx3-ubyte.gz (9.9 MB, 47 MB unzipped, and 60,000 samples) - Training set labels: train-labels-idx1-ubyte.gz (29 KB, 60 KB unzipped, and 60,000 labels) - Test set images: t10k-images-idx3-ubyte.gz (1.6 MB, 7.8 MB, unzipped and 10,000 samples) - Test set labels: t10k-labels-idx1-ubyte.gz (5 KB, 10 KB unzipped, and 10,000 labels) Features Each feature vector (row in the feature matrix) consists of 784 pixels (intensities) -- unrolled from the original 28x28 pixels images. Number of samples: 50000 images Target variable (discrete): {50x Setosa, 50x Versicolor, 50x Virginica}","title":"Overview"},{"location":"user_guide/data/loadlocal_mnist/#references","text":"Source: http://yann.lecun.com/exdb/mnist/ Y. LeCun and C. Cortes. Mnist handwritten digit database. AT&T Labs [Online]. Available: http://yann. lecun. com/exdb/mnist, 2010.","title":"References"},{"location":"user_guide/data/loadlocal_mnist/#example-1-part-1-downloading-the-mnist-dataset","text":"1) Download the MNIST files from Y. LeCun's website http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz for example, via curl -O http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz 2) Unzip the downloaded gzip archives for example, via gunzip t*-ubyte.gz","title":"Example 1 Part 1 - Downloading the MNIST dataset"},{"location":"user_guide/data/loadlocal_mnist/#example-1-part-2-loading-mnist-into-numpy-arrays","text":"from mlxtend.data import loadlocal_mnist X, y = loadlocal_mnist( images_path='/Users/Sebastian/Desktop/train-images-idx3-ubyte', labels_path='/Users/Sebastian/Desktop/train-labels-idx1-ubyte') print('Dimensions: %s x %s' % (X.shape[0], X.shape[1])) print('\\n1st row', X[0]) Dimensions: 60000 x 784 1st row [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 18 18 18 126 136 175 26 166 255 247 127 0 0 0 0 0 0 0 0 0 0 0 0 30 36 94 154 170 253 253 253 253 253 225 172 253 242 195 64 0 0 0 0 0 0 0 0 0 0 0 49 238 253 253 253 253 253 253 253 253 251 93 82 82 56 39 0 0 0 0 0 0 0 0 0 0 0 0 18 219 253 253 253 253 253 198 182 247 241 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 80 156 107 253 253 205 11 0 43 154 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 14 1 154 253 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 139 253 190 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11 190 253 70 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 35 241 225 160 108 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 81 240 253 253 119 25 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 45 186 253 253 150 27 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 16 93 252 253 187 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 249 253 249 64 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 46 130 183 253 253 207 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 39 148 229 253 253 253 250 182 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 24 114 221 253 253 253 253 201 78 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 23 66 213 253 253 253 253 198 81 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 18 171 219 253 253 253 253 195 80 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 55 172 226 253 253 253 253 244 133 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 136 253 253 253 212 135 132 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] import numpy as np print('Digits: 0 1 2 3 4 5 6 7 8 9') print('labels: %s' % np.unique(y)) print('Class distribution: %s' % np.bincount(y)) Digits: 0 1 2 3 4 5 6 7 8 9 labels: [0 1 2 3 4 5 6 7 8 9] Class distribution: [5923 6742 5958 6131 5842 5421 5918 6265 5851 5949]","title":"Example 1 Part 2 - Loading MNIST into NumPy Arrays"},{"location":"user_guide/data/loadlocal_mnist/#store-as-csv-files","text":"np.savetxt(fname='/Users/Sebastian/Desktop/images.csv', X=X, delimiter=',', fmt='%d') np.savetxt(fname='/Users/Sebastian/Desktop/labels.csv', X=y, delimiter=',', fmt='%d')","title":"Store as CSV Files"},{"location":"user_guide/data/loadlocal_mnist/#api","text":"loadlocal_mnist(images_path, labels_path) Read MNIST from ubyte files. Parameters images_path : str path to the test or train MNIST ubyte file labels_path : str path to the test or train MNIST class labels file Returns images : [n_samples, n_pixels] numpy.array Pixel values of the images. labels : [n_samples] numpy array Target class labels Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/data/loadlocal_mnist/","title":"API"},{"location":"user_guide/data/make_multiplexer_dataset/","text":"Make Multiplexer Dataset Function that creates a dataset generated by a n-bit Boolean multiplexer for evaluating supervised learning algorithms. from mlxtend.data import make_multiplexer_dataset Overview The make_multiplexer_dataset function creates a dataset generated by an n-bit Boolean multiplexer. Such dataset represents a dataset generated by a simple rule, based on the behavior of a electric multiplexer, yet presents a relatively challenging classification problem for supervised learning algorithm with interactions between features (epistasis) as it may be encountered in many real-world scenarios [1]. The following illustration depicts a 6-bit multiplexer that consists of 2 address bits and 4 register bits. The address bits converted to decimal representation point to a position in the register bit. For example, if the address bits are \"00\" (0 in decimal), the address bits point to the register bit at position 0. The value of the register position pointed to determines the class label. For example, if the register bit at position is 0, the class label is 0. Vice versa, if the register bit at position 0 is 1, the class label is 1. In the example above, the address bits \"10\" (2 in decimal) point to the 3rd register position (as we start counting from index 0), which has a bit value of 1. Hence, the class label is 1. Below are a few more examples: Address bits: [0, 1], register bits: [1, 0, 1, 1], class label: 0 Address bits: [0, 1], register bits: [1, 1, 1, 0], class label: 1 Address bits: [1, 0], register bits: [1, 0, 0, 1], class label: 0 Address bits: [1, 1], register bits: [1, 1, 1, 0], class label: 0 Address bits: [0, 1], register bits: [0, 1, 1, 0], class label: 1 Address bits: [0, 1], register bits: [1, 0, 0, 1], class label: 0 Address bits: [0, 1], register bits: [0, 1, 1, 1], class label: 1 Address bits: [0, 1], register bits: [0, 0, 0, 0], class label: 0 Address bits: [1, 0], register bits: [1, 0, 1, 1], class label: 1 Address bits: [0, 1], register bits: [1, 1, 1, 1], class label: 1 Note that in the implementation of the multiplexer function, if the number of address bits is set to 2, this results in a 6 bit multiplexer as two bit can have 2^2=4 different register positions (2 bit + 4 bit = 6 bit). However, if we choose 3 address bits instead, 2^3=8 positions would be covered, resulting in a 11 bit (3 bit + 8 bit = 11 bit) multiplexer, and so forth. References [1] Urbanowicz, R. J., & Browne, W. N. (2017). Introduction to Learning Classifier Systems . Springer. Example 1 -- 6-bit multiplexer This simple example illustrates how to create dataset from a 6-bit multiplexer import numpy as np from mlxtend.data import make_multiplexer_dataset X, y = make_multiplexer_dataset(address_bits=2, sample_size=10, positive_class_ratio=0.5, shuffle=False, random_seed=123) print('Features:\\n', X) print('\\nClass labels:\\n', y) Features: [[0 1 0 1 0 1] [1 0 0 0 1 1] [0 1 1 1 0 0] [0 1 1 1 0 0] [0 0 1 1 0 0] [0 1 0 0 0 0] [0 1 1 0 1 1] [1 0 1 0 0 0] [1 0 0 1 0 1] [1 0 1 0 0 1]] Class labels: [1 1 1 1 1 0 0 0 0 0] API make_multiplexer_dataset(address_bits=2, sample_size=100, positive_class_ratio=0.5, shuffle=False, random_seed=None) Function to create a binary n-bit multiplexer dataset. New in mlxtend v0.9 Parameters address_bits : int (default: 2) A positive integer that determines the number of address bits in the multiplexer, which in turn determine the n-bit capacity of the multiplexer and therefore the number of features. The number of features is determined by the number of address bits. For example, 2 address bits will result in a 6 bit multiplexer and consequently 6 features (2 + 2^2 = 6). If address_bits=3 , then this results in an 11-bit multiplexer as (2 + 2^3 = 11) with 11 features. sample_size : int (default: 100) The total number of samples generated. positive_class_ratio : float (default: 0.5) The fraction (a float between 0 and 1) of samples in the sample_size d dataset that have class label 1. If positive_class_ratio=0.5 (default), then the ratio of class 0 and class 1 samples is perfectly balanced. shuffle : Bool (default: False) Whether or not to shuffle the features and labels. If False (default), the samples are returned in sorted order starting with sample_size /2 samples with class label 0 and followed by sample_size /2 samples with class label 1. random_seed : int (default: None) Random seed used for generating the multiplexer samples and shuffling. Returns X, y : [n_samples, n_features], [n_class_labels] X is the feature matrix with the number of samples equal to sample_size . The number of features is determined by the number of address bits. For instance, 2 address bits will result in a 6 bit multiplexer and consequently 6 features (2 + 2^2 = 6). All features are binary (values in {0, 1}). y is a 1-dimensional array of class labels in {0, 1}. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/data/make_multiplexer_dataset","title":"Make Multiplexer Dataset"},{"location":"user_guide/data/make_multiplexer_dataset/#make-multiplexer-dataset","text":"Function that creates a dataset generated by a n-bit Boolean multiplexer for evaluating supervised learning algorithms. from mlxtend.data import make_multiplexer_dataset","title":"Make Multiplexer Dataset"},{"location":"user_guide/data/make_multiplexer_dataset/#overview","text":"The make_multiplexer_dataset function creates a dataset generated by an n-bit Boolean multiplexer. Such dataset represents a dataset generated by a simple rule, based on the behavior of a electric multiplexer, yet presents a relatively challenging classification problem for supervised learning algorithm with interactions between features (epistasis) as it may be encountered in many real-world scenarios [1]. The following illustration depicts a 6-bit multiplexer that consists of 2 address bits and 4 register bits. The address bits converted to decimal representation point to a position in the register bit. For example, if the address bits are \"00\" (0 in decimal), the address bits point to the register bit at position 0. The value of the register position pointed to determines the class label. For example, if the register bit at position is 0, the class label is 0. Vice versa, if the register bit at position 0 is 1, the class label is 1. In the example above, the address bits \"10\" (2 in decimal) point to the 3rd register position (as we start counting from index 0), which has a bit value of 1. Hence, the class label is 1. Below are a few more examples: Address bits: [0, 1], register bits: [1, 0, 1, 1], class label: 0 Address bits: [0, 1], register bits: [1, 1, 1, 0], class label: 1 Address bits: [1, 0], register bits: [1, 0, 0, 1], class label: 0 Address bits: [1, 1], register bits: [1, 1, 1, 0], class label: 0 Address bits: [0, 1], register bits: [0, 1, 1, 0], class label: 1 Address bits: [0, 1], register bits: [1, 0, 0, 1], class label: 0 Address bits: [0, 1], register bits: [0, 1, 1, 1], class label: 1 Address bits: [0, 1], register bits: [0, 0, 0, 0], class label: 0 Address bits: [1, 0], register bits: [1, 0, 1, 1], class label: 1 Address bits: [0, 1], register bits: [1, 1, 1, 1], class label: 1 Note that in the implementation of the multiplexer function, if the number of address bits is set to 2, this results in a 6 bit multiplexer as two bit can have 2^2=4 different register positions (2 bit + 4 bit = 6 bit). However, if we choose 3 address bits instead, 2^3=8 positions would be covered, resulting in a 11 bit (3 bit + 8 bit = 11 bit) multiplexer, and so forth.","title":"Overview"},{"location":"user_guide/data/make_multiplexer_dataset/#references","text":"[1] Urbanowicz, R. J., & Browne, W. N. (2017). Introduction to Learning Classifier Systems . Springer.","title":"References"},{"location":"user_guide/data/make_multiplexer_dataset/#example-1-6-bit-multiplexer","text":"This simple example illustrates how to create dataset from a 6-bit multiplexer import numpy as np from mlxtend.data import make_multiplexer_dataset X, y = make_multiplexer_dataset(address_bits=2, sample_size=10, positive_class_ratio=0.5, shuffle=False, random_seed=123) print('Features:\\n', X) print('\\nClass labels:\\n', y) Features: [[0 1 0 1 0 1] [1 0 0 0 1 1] [0 1 1 1 0 0] [0 1 1 1 0 0] [0 0 1 1 0 0] [0 1 0 0 0 0] [0 1 1 0 1 1] [1 0 1 0 0 0] [1 0 0 1 0 1] [1 0 1 0 0 1]] Class labels: [1 1 1 1 1 0 0 0 0 0]","title":"Example 1 -- 6-bit multiplexer"},{"location":"user_guide/data/make_multiplexer_dataset/#api","text":"make_multiplexer_dataset(address_bits=2, sample_size=100, positive_class_ratio=0.5, shuffle=False, random_seed=None) Function to create a binary n-bit multiplexer dataset. New in mlxtend v0.9 Parameters address_bits : int (default: 2) A positive integer that determines the number of address bits in the multiplexer, which in turn determine the n-bit capacity of the multiplexer and therefore the number of features. The number of features is determined by the number of address bits. For example, 2 address bits will result in a 6 bit multiplexer and consequently 6 features (2 + 2^2 = 6). If address_bits=3 , then this results in an 11-bit multiplexer as (2 + 2^3 = 11) with 11 features. sample_size : int (default: 100) The total number of samples generated. positive_class_ratio : float (default: 0.5) The fraction (a float between 0 and 1) of samples in the sample_size d dataset that have class label 1. If positive_class_ratio=0.5 (default), then the ratio of class 0 and class 1 samples is perfectly balanced. shuffle : Bool (default: False) Whether or not to shuffle the features and labels. If False (default), the samples are returned in sorted order starting with sample_size /2 samples with class label 0 and followed by sample_size /2 samples with class label 1. random_seed : int (default: None) Random seed used for generating the multiplexer samples and shuffling. Returns X, y : [n_samples, n_features], [n_class_labels] X is the feature matrix with the number of samples equal to sample_size . The number of features is determined by the number of address bits. For instance, 2 address bits will result in a 6 bit multiplexer and consequently 6 features (2 + 2^2 = 6). All features are binary (values in {0, 1}). y is a 1-dimensional array of class labels in {0, 1}. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/data/make_multiplexer_dataset","title":"API"},{"location":"user_guide/data/mnist_data/","text":"MNIST Dataset A function that loads the MNIST dataset into NumPy arrays. from mlxtend.data import mnist_data Overview The MNIST dataset was constructed from two datasets of the US National Institute of Standards and Technology (NIST). The training set consists of handwritten digits from 250 different people, 50 percent high school students, and 50 percent employees from the Census Bureau. Note that the test set contains handwritten digits from different people following the same split. Features Each feature vector (row in the feature matrix) consists of 784 pixels (intensities) -- unrolled from the original 28x28 pixels images. Number of samples: A subset of 5000 images (the first 500 digits of each class) Target variable (discrete): {500x 0, ..., 500x 9} References Source: http://yann.lecun.com/exdb/mnist/ Y. LeCun and C. Cortes. Mnist handwritten digit database. AT&T Labs [Online]. Available: http://yann.lecun.com/exdb/mnist , 2010. Example 1 - Dataset overview from mlxtend.data import mnist_data X, y = mnist_data() print('Dimensions: %s x %s' % (X.shape[0], X.shape[1])) print('1st row', X[0]) Dimensions: 5000 x 784 1st row [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 51. 159. 253. 159. 50. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 48. 238. 252. 252. 252. 237. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 54. 227. 253. 252. 239. 233. 252. 57. 6. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 10. 60. 224. 252. 253. 252. 202. 84. 252. 253. 122. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 163. 252. 252. 252. 253. 252. 252. 96. 189. 253. 167. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 51. 238. 253. 253. 190. 114. 253. 228. 47. 79. 255. 168. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 48. 238. 252. 252. 179. 12. 75. 121. 21. 0. 0. 253. 243. 50. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 38. 165. 253. 233. 208. 84. 0. 0. 0. 0. 0. 0. 253. 252. 165. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 7. 178. 252. 240. 71. 19. 28. 0. 0. 0. 0. 0. 0. 253. 252. 195. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 57. 252. 252. 63. 0. 0. 0. 0. 0. 0. 0. 0. 0. 253. 252. 195. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 198. 253. 190. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 255. 253. 196. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 76. 246. 252. 112. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 253. 252. 148. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 85. 252. 230. 25. 0. 0. 0. 0. 0. 0. 0. 0. 7. 135. 253. 186. 12. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 85. 252. 223. 0. 0. 0. 0. 0. 0. 0. 0. 7. 131. 252. 225. 71. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 85. 252. 145. 0. 0. 0. 0. 0. 0. 0. 48. 165. 252. 173. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 86. 253. 225. 0. 0. 0. 0. 0. 0. 114. 238. 253. 162. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 85. 252. 249. 146. 48. 29. 85. 178. 225. 253. 223. 167. 56. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 85. 252. 252. 252. 229. 215. 252. 252. 252. 196. 130. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 28. 199. 252. 252. 253. 252. 252. 233. 145. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 25. 128. 252. 253. 252. 141. 37. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] import numpy as np print('Classes: Setosa, Versicolor, Virginica') print(np.unique(y)) print('Class distribution: %s' % np.bincount(y)) Classes: Setosa, Versicolor, Virginica [0 1 2 3 4 5 6 7 8 9] Class distribution: [500 500 500 500 500 500 500 500 500 500] Example 2 - Visualize MNIST %matplotlib inline import matplotlib.pyplot as plt def plot_digit(X, y, idx): img = X[idx].reshape(28,28) plt.imshow(img, cmap='Greys', interpolation='nearest') plt.title('true label: %d' % y[idx]) plt.show() plot_digit(X, y, 4) API mnist_data() 5000 samples from the MNIST handwritten digits dataset. Data Source : http://yann.lecun.com/exdb/mnist/ Returns X, y : [n_samples, n_features], [n_class_labels] X is the feature matrix with 5000 image samples as rows, each row consists of 28x28 pixels that were unrolled into 784 pixel feature vectors. y contains the 10 unique class labels 0-9. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/data/mnist_data/","title":"MNIST Dataset"},{"location":"user_guide/data/mnist_data/#mnist-dataset","text":"A function that loads the MNIST dataset into NumPy arrays. from mlxtend.data import mnist_data","title":"MNIST Dataset"},{"location":"user_guide/data/mnist_data/#overview","text":"The MNIST dataset was constructed from two datasets of the US National Institute of Standards and Technology (NIST). The training set consists of handwritten digits from 250 different people, 50 percent high school students, and 50 percent employees from the Census Bureau. Note that the test set contains handwritten digits from different people following the same split. Features Each feature vector (row in the feature matrix) consists of 784 pixels (intensities) -- unrolled from the original 28x28 pixels images. Number of samples: A subset of 5000 images (the first 500 digits of each class) Target variable (discrete): {500x 0, ..., 500x 9}","title":"Overview"},{"location":"user_guide/data/mnist_data/#references","text":"Source: http://yann.lecun.com/exdb/mnist/ Y. LeCun and C. Cortes. Mnist handwritten digit database. AT&T Labs [Online]. Available: http://yann.lecun.com/exdb/mnist , 2010.","title":"References"},{"location":"user_guide/data/mnist_data/#example-1-dataset-overview","text":"from mlxtend.data import mnist_data X, y = mnist_data() print('Dimensions: %s x %s' % (X.shape[0], X.shape[1])) print('1st row', X[0]) Dimensions: 5000 x 784 1st row [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 51. 159. 253. 159. 50. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 48. 238. 252. 252. 252. 237. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 54. 227. 253. 252. 239. 233. 252. 57. 6. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 10. 60. 224. 252. 253. 252. 202. 84. 252. 253. 122. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 163. 252. 252. 252. 253. 252. 252. 96. 189. 253. 167. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 51. 238. 253. 253. 190. 114. 253. 228. 47. 79. 255. 168. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 48. 238. 252. 252. 179. 12. 75. 121. 21. 0. 0. 253. 243. 50. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 38. 165. 253. 233. 208. 84. 0. 0. 0. 0. 0. 0. 253. 252. 165. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 7. 178. 252. 240. 71. 19. 28. 0. 0. 0. 0. 0. 0. 253. 252. 195. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 57. 252. 252. 63. 0. 0. 0. 0. 0. 0. 0. 0. 0. 253. 252. 195. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 198. 253. 190. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 255. 253. 196. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 76. 246. 252. 112. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 253. 252. 148. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 85. 252. 230. 25. 0. 0. 0. 0. 0. 0. 0. 0. 7. 135. 253. 186. 12. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 85. 252. 223. 0. 0. 0. 0. 0. 0. 0. 0. 7. 131. 252. 225. 71. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 85. 252. 145. 0. 0. 0. 0. 0. 0. 0. 48. 165. 252. 173. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 86. 253. 225. 0. 0. 0. 0. 0. 0. 114. 238. 253. 162. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 85. 252. 249. 146. 48. 29. 85. 178. 225. 253. 223. 167. 56. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 85. 252. 252. 252. 229. 215. 252. 252. 252. 196. 130. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 28. 199. 252. 252. 253. 252. 252. 233. 145. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 25. 128. 252. 253. 252. 141. 37. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] import numpy as np print('Classes: Setosa, Versicolor, Virginica') print(np.unique(y)) print('Class distribution: %s' % np.bincount(y)) Classes: Setosa, Versicolor, Virginica [0 1 2 3 4 5 6 7 8 9] Class distribution: [500 500 500 500 500 500 500 500 500 500]","title":"Example 1 - Dataset overview"},{"location":"user_guide/data/mnist_data/#example-2-visualize-mnist","text":"%matplotlib inline import matplotlib.pyplot as plt def plot_digit(X, y, idx): img = X[idx].reshape(28,28) plt.imshow(img, cmap='Greys', interpolation='nearest') plt.title('true label: %d' % y[idx]) plt.show() plot_digit(X, y, 4)","title":"Example 2 - Visualize MNIST"},{"location":"user_guide/data/mnist_data/#api","text":"mnist_data() 5000 samples from the MNIST handwritten digits dataset. Data Source : http://yann.lecun.com/exdb/mnist/ Returns X, y : [n_samples, n_features], [n_class_labels] X is the feature matrix with 5000 image samples as rows, each row consists of 28x28 pixels that were unrolled into 784 pixel feature vectors. y contains the 10 unique class labels 0-9. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/data/mnist_data/","title":"API"},{"location":"user_guide/data/three_blobs_data/","text":"Three Blobs Dataset A function that loads the three_blobs dataset into NumPy arrays. from mlxtend.data import three_blobs_data Overview A random dataset of 3 2D blobs for clustering. Number of samples : 150 Suggested labels \\in {0, 1, 2}, distribution: [50, 50, 50] References Example 1 - Dataset overview from mlxtend.data import three_blobs_data X, y = three_blobs_data() print('Dimensions: %s x %s' % (X.shape[0], X.shape[1])) print('1st row', X[0]) Dimensions: 150 x 2 1st row [ 2.60509732 1.22529553] import numpy as np print('Suggested cluster labels') print(np.unique(y)) print('Label distribution: %s' % np.bincount(y)) Suggested cluster labels [0 1 2] Label distribution: [50 50 50] import matplotlib.pyplot as plt plt.scatter(X[:,0], X[:,1], c='white', marker='o', s=50) plt.grid() plt.show() plt.scatter(X[y == 0, 0], X[y == 0, 1], s=50, c='lightgreen', marker='s', label='cluster 1') plt.scatter(X[y == 1,0], X[y == 1,1], s=50, c='orange', marker='o', label='cluster 2') plt.scatter(X[y == 2,0], X[y == 2,1], s=50, c='lightblue', marker='v', label='cluster 3') plt.legend(loc='lower left') plt.grid() plt.show() API three_blobs_data() A random dataset of 3 2D blobs for clustering. Number of samples : 150 Suggested labels : {0, 1, 2}, distribution: [50, 50, 50] Returns X, y : [n_samples, n_features], [n_cluster_labels] X is the feature matrix with 159 samples as rows and 2 feature columns. y is a 1-dimensional array of the 3 suggested cluster labels 0, 1, 2 Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/data/three_blobs_data","title":"Three Blobs Dataset"},{"location":"user_guide/data/three_blobs_data/#three-blobs-dataset","text":"A function that loads the three_blobs dataset into NumPy arrays. from mlxtend.data import three_blobs_data","title":"Three Blobs Dataset"},{"location":"user_guide/data/three_blobs_data/#overview","text":"A random dataset of 3 2D blobs for clustering. Number of samples : 150 Suggested labels \\in {0, 1, 2}, distribution: [50, 50, 50]","title":"Overview"},{"location":"user_guide/data/three_blobs_data/#references","text":"","title":"References"},{"location":"user_guide/data/three_blobs_data/#example-1-dataset-overview","text":"from mlxtend.data import three_blobs_data X, y = three_blobs_data() print('Dimensions: %s x %s' % (X.shape[0], X.shape[1])) print('1st row', X[0]) Dimensions: 150 x 2 1st row [ 2.60509732 1.22529553] import numpy as np print('Suggested cluster labels') print(np.unique(y)) print('Label distribution: %s' % np.bincount(y)) Suggested cluster labels [0 1 2] Label distribution: [50 50 50] import matplotlib.pyplot as plt plt.scatter(X[:,0], X[:,1], c='white', marker='o', s=50) plt.grid() plt.show() plt.scatter(X[y == 0, 0], X[y == 0, 1], s=50, c='lightgreen', marker='s', label='cluster 1') plt.scatter(X[y == 1,0], X[y == 1,1], s=50, c='orange', marker='o', label='cluster 2') plt.scatter(X[y == 2,0], X[y == 2,1], s=50, c='lightblue', marker='v', label='cluster 3') plt.legend(loc='lower left') plt.grid() plt.show()","title":"Example 1 - Dataset overview"},{"location":"user_guide/data/three_blobs_data/#api","text":"three_blobs_data() A random dataset of 3 2D blobs for clustering. Number of samples : 150 Suggested labels : {0, 1, 2}, distribution: [50, 50, 50] Returns X, y : [n_samples, n_features], [n_cluster_labels] X is the feature matrix with 159 samples as rows and 2 feature columns. y is a 1-dimensional array of the 3 suggested cluster labels 0, 1, 2 Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/data/three_blobs_data","title":"API"},{"location":"user_guide/data/wine_data/","text":"Wine Dataset A function that loads the Wine dataset into NumPy arrays. from mlxtend.data import wine_data Overview The Wine dataset for classification. Samples 178 Features 13 Classes 3 Data Set Characteristics: Multivariate Attribute Characteristics: Integer, Real Associated Tasks: Classification Missing Values None column attribute 1) Class Label 2) Alcohol 3) Malic acid 4) Ash 5) Alcalinity of ash 6) Magnesium 7) Total phenols 8) Flavanoids 9) Nonflavanoid phenols 10) Proanthocyanins 11) Color intensity 12) Hue 13) OD280/OD315 of diluted wines 14) Proline class samples 0 59 1 71 2 48 References Forina, M. et al, PARVUS - An Extendible Package for Data Exploration, Classification and Correlation. Institute of Pharmaceutical and Food Analysis and Technologies, Via Brigata Salerno, 16147 Genoa, Italy. Source: https://archive.ics.uci.edu/ml/datasets/Wine Bache, K. & Lichman, M. (2013). UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science. Example 1 - Dataset overview from mlxtend.data import wine_data X, y = wine_data() print('Dimensions: %s x %s' % (X.shape[0], X.shape[1])) print('\\nHeader: %s' % ['alcohol', 'malic acid', 'ash', 'ash alcalinity', 'magnesium', 'total phenols', 'flavanoids', 'nonflavanoid phenols', 'proanthocyanins', 'color intensity', 'hue', 'OD280/OD315 of diluted wines', 'proline']) print('1st row', X[0]) Dimensions: 178 x 13 Header: ['alcohol', 'malic acid', 'ash', 'ash alcalinity', 'magnesium', 'total phenols', 'flavanoids', 'nonflavanoid phenols', 'proanthocyanins', 'color intensity', 'hue', 'OD280/OD315 of diluted wines', 'proline'] 1st row [ 1.42300000e+01 1.71000000e+00 2.43000000e+00 1.56000000e+01 1.27000000e+02 2.80000000e+00 3.06000000e+00 2.80000000e-01 2.29000000e+00 5.64000000e+00 1.04000000e+00 3.92000000e+00 1.06500000e+03] import numpy as np print('Classes: %s' % np.unique(y)) print('Class distribution: %s' % np.bincount(y)) Classes: [0 1 2] Class distribution: [59 71 48] API wine_data() Wine dataset. Source : https://archive.ics.uci.edu/ml/datasets/Wine Number of samples : 178 Class labels : {0, 1, 2}, distribution: [59, 71, 48] Dataset Attributes: 1) Alcohol 2) Malic acid 3) Ash 4) Alcalinity of ash 5) Magnesium 6) Total phenols 7) Flavanoids 8) Nonflavanoid phenols 9) Proanthocyanins 10) Color intensity 11) Hue 12) OD280/OD315 of diluted wines 13) Proline Returns X, y : [n_samples, n_features], [n_class_labels] X is the feature matrix with 178 wine samples as rows and 13 feature columns. y is a 1-dimensional array of the 3 class labels 0, 1, 2 Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/data/wine_data","title":"Wine Dataset"},{"location":"user_guide/data/wine_data/#wine-dataset","text":"A function that loads the Wine dataset into NumPy arrays. from mlxtend.data import wine_data","title":"Wine Dataset"},{"location":"user_guide/data/wine_data/#overview","text":"The Wine dataset for classification. Samples 178 Features 13 Classes 3 Data Set Characteristics: Multivariate Attribute Characteristics: Integer, Real Associated Tasks: Classification Missing Values None column attribute 1) Class Label 2) Alcohol 3) Malic acid 4) Ash 5) Alcalinity of ash 6) Magnesium 7) Total phenols 8) Flavanoids 9) Nonflavanoid phenols 10) Proanthocyanins 11) Color intensity 12) Hue 13) OD280/OD315 of diluted wines 14) Proline class samples 0 59 1 71 2 48","title":"Overview"},{"location":"user_guide/data/wine_data/#references","text":"Forina, M. et al, PARVUS - An Extendible Package for Data Exploration, Classification and Correlation. Institute of Pharmaceutical and Food Analysis and Technologies, Via Brigata Salerno, 16147 Genoa, Italy. Source: https://archive.ics.uci.edu/ml/datasets/Wine Bache, K. & Lichman, M. (2013). UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science.","title":"References"},{"location":"user_guide/data/wine_data/#example-1-dataset-overview","text":"from mlxtend.data import wine_data X, y = wine_data() print('Dimensions: %s x %s' % (X.shape[0], X.shape[1])) print('\\nHeader: %s' % ['alcohol', 'malic acid', 'ash', 'ash alcalinity', 'magnesium', 'total phenols', 'flavanoids', 'nonflavanoid phenols', 'proanthocyanins', 'color intensity', 'hue', 'OD280/OD315 of diluted wines', 'proline']) print('1st row', X[0]) Dimensions: 178 x 13 Header: ['alcohol', 'malic acid', 'ash', 'ash alcalinity', 'magnesium', 'total phenols', 'flavanoids', 'nonflavanoid phenols', 'proanthocyanins', 'color intensity', 'hue', 'OD280/OD315 of diluted wines', 'proline'] 1st row [ 1.42300000e+01 1.71000000e+00 2.43000000e+00 1.56000000e+01 1.27000000e+02 2.80000000e+00 3.06000000e+00 2.80000000e-01 2.29000000e+00 5.64000000e+00 1.04000000e+00 3.92000000e+00 1.06500000e+03] import numpy as np print('Classes: %s' % np.unique(y)) print('Class distribution: %s' % np.bincount(y)) Classes: [0 1 2] Class distribution: [59 71 48]","title":"Example 1 - Dataset overview"},{"location":"user_guide/data/wine_data/#api","text":"wine_data() Wine dataset. Source : https://archive.ics.uci.edu/ml/datasets/Wine Number of samples : 178 Class labels : {0, 1, 2}, distribution: [59, 71, 48] Dataset Attributes: 1) Alcohol 2) Malic acid 3) Ash 4) Alcalinity of ash 5) Magnesium 6) Total phenols 7) Flavanoids 8) Nonflavanoid phenols 9) Proanthocyanins 10) Color intensity 11) Hue 12) OD280/OD315 of diluted wines 13) Proline Returns X, y : [n_samples, n_features], [n_class_labels] X is the feature matrix with 178 wine samples as rows and 13 feature columns. y is a 1-dimensional array of the 3 class labels 0, 1, 2 Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/data/wine_data","title":"API"},{"location":"user_guide/evaluate/BootstrapOutOfBag/","text":"BootstrapOutOfBag An implementation of the out-of-bag bootstrap to evaluate supervised learning algorithms. from mlxtend.evaluate import BootstrapOutOfBag Overview Originally, the bootstrap method aims to determine the statistical properties of an estimator when the underlying distribution was unknown and additional samples are not available. Now, in order to exploit this method for the evaluation of predictive models, such as hypotheses for classification and regression, we may prefer a slightly different approach to bootstrapping using the so-called Out-Of-Bag (OOB) or Leave-One-Out Bootstrap (LOOB) technique. Here, we use out-of-bag samples as test sets for evaluation instead of evaluating the model on the training data. Out-of-bag samples are the unique sets of instances that are not used for model fitting as shown in the figure below [1]. The figure above illustrates how three random bootstrap samples drawn from an exemplary ten-sample dataset ( X_1,X_2, ..., X_{10} ) and their out-of-bag sample for testing may look like. In practice, Bradley Efron and Robert Tibshirani recommend drawing 50 to 200 bootstrap samples as being sufficient for reliable estimates [2]. References [1] https://sebastianraschka.com/blog/2016/model-evaluation-selection-part2.html [2] Efron, Bradley, and Robert J. Tibshirani. An introduction to the bootstrap. CRC press, 1994. Management of Data (ACM SIGMOD '97), pages 265-276, 1997. Example 1 -- Evaluating the predictive performance of a model The BootstrapOutOfBag class mimics the behavior of scikit-learn's cross-validation classes, e.g., KFold : from mlxtend.evaluate import BootstrapOutOfBag import numpy as np oob = BootstrapOutOfBag(n_splits=3) for train, test in oob.split(np.array([1, 2, 3, 4, 5])): print(train, test) [4 2 1 3 3] [0] [2 4 1 2 1] [0 3] [4 3 3 4 1] [0 2] Consequently, we can use BootstrapOutOfBag objects via the cross_val_score method: from sklearn.datasets import load_iris from sklearn.linear_model import LogisticRegression from sklearn.model_selection import cross_val_score iris = load_iris() X = iris.data y = iris.target lr = LogisticRegression() print(cross_val_score(lr, X, y)) [ 0.96078431 0.92156863 0.95833333] print(cross_val_score(lr, X, y, cv=BootstrapOutOfBag(n_splits=3, random_seed=456))) [ 0.92727273 0.96226415 0.94444444] In practice, it is recommended to run at least 200 iterations, though: print('Mean accuracy: %.1f%%' % np.mean(100*cross_val_score( lr, X, y, cv=BootstrapOutOfBag(n_splits=200, random_seed=456)))) Mean accuracy: 94.8% Using the bootstrap, we can use the percentile method to compute the confidence bounds of the performance estimate. We pick our lower and upper confidence bounds as follows: ACC_{lower} = \\alpha_1th percentile of the ACC_{boot} distribution ACC_{lower} = \\alpha_2th percentile of the ACC_{boot} distribution where \\alpha_1 = \\alpha and \\alpha_2 = 1-\\alpha , and the degree of confidence to compute the 100 \\times (1-2 \\times \\alpha) confidence interval. For instance, to compute a 95% confidence interval, we pick \\alpha=0.025 to obtain the 2.5th and 97.5th percentiles of the b bootstrap samples distribution as the upper and lower confidence bounds. import matplotlib.pyplot as plt %matplotlib inline accuracies = cross_val_score(lr, X, y, cv=BootstrapOutOfBag(n_splits=1000, random_seed=456)) mean = np.mean(accuracies) lower = np.percentile(accuracies, 2.5) upper = np.percentile(accuracies, 97.5) fig, ax = plt.subplots(figsize=(8, 4)) ax.vlines(mean, [0], 40, lw=2.5, linestyle='-', label='mean') ax.vlines(lower, [0], 15, lw=2.5, linestyle='-.', label='CI95 percentile') ax.vlines(upper, [0], 15, lw=2.5, linestyle='-.') ax.hist(accuracies, bins=11, color='#0080ff', edgecolor=\"none\", alpha=0.3) plt.legend(loc='upper left') plt.show() API BootstrapOutOfBag(n_splits=200, random_seed=None) Parameters n_splits : int (default=200) Number of bootstrap iterations. Must be larger than 1. random_seed : int (default=None) If int, random_seed is the seed used by the random number generator. Returns train_idx : ndarray The training set indices for that split. test_idx : ndarray The testing set indices for that split. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/BootstrapOutOfBag/ Methods get_n_splits(X=None, y=None, groups=None) Returns the number of splitting iterations in the cross-validator Parameters X : object Always ignored, exists for compatibility with scikit-learn. y : object Always ignored, exists for compatibility with scikit-learn. groups : object Always ignored, exists for compatibility with scikit-learn. Returns n_splits : int Returns the number of splitting iterations in the cross-validator. split(X, y=None, groups=None) y : array-like or None (default: None) Argument is not used and only included as parameter for compatibility, similar to KFold in scikit-learn. groups : array-like or None (default: None) Argument is not used and only included as parameter for compatibility, similar to KFold in scikit-learn.","title":"BootstrapOutOfBag"},{"location":"user_guide/evaluate/BootstrapOutOfBag/#bootstrapoutofbag","text":"An implementation of the out-of-bag bootstrap to evaluate supervised learning algorithms. from mlxtend.evaluate import BootstrapOutOfBag","title":"BootstrapOutOfBag"},{"location":"user_guide/evaluate/BootstrapOutOfBag/#overview","text":"Originally, the bootstrap method aims to determine the statistical properties of an estimator when the underlying distribution was unknown and additional samples are not available. Now, in order to exploit this method for the evaluation of predictive models, such as hypotheses for classification and regression, we may prefer a slightly different approach to bootstrapping using the so-called Out-Of-Bag (OOB) or Leave-One-Out Bootstrap (LOOB) technique. Here, we use out-of-bag samples as test sets for evaluation instead of evaluating the model on the training data. Out-of-bag samples are the unique sets of instances that are not used for model fitting as shown in the figure below [1]. The figure above illustrates how three random bootstrap samples drawn from an exemplary ten-sample dataset ( X_1,X_2, ..., X_{10} ) and their out-of-bag sample for testing may look like. In practice, Bradley Efron and Robert Tibshirani recommend drawing 50 to 200 bootstrap samples as being sufficient for reliable estimates [2].","title":"Overview"},{"location":"user_guide/evaluate/BootstrapOutOfBag/#references","text":"[1] https://sebastianraschka.com/blog/2016/model-evaluation-selection-part2.html [2] Efron, Bradley, and Robert J. Tibshirani. An introduction to the bootstrap. CRC press, 1994. Management of Data (ACM SIGMOD '97), pages 265-276, 1997.","title":"References"},{"location":"user_guide/evaluate/BootstrapOutOfBag/#example-1-evaluating-the-predictive-performance-of-a-model","text":"The BootstrapOutOfBag class mimics the behavior of scikit-learn's cross-validation classes, e.g., KFold : from mlxtend.evaluate import BootstrapOutOfBag import numpy as np oob = BootstrapOutOfBag(n_splits=3) for train, test in oob.split(np.array([1, 2, 3, 4, 5])): print(train, test) [4 2 1 3 3] [0] [2 4 1 2 1] [0 3] [4 3 3 4 1] [0 2] Consequently, we can use BootstrapOutOfBag objects via the cross_val_score method: from sklearn.datasets import load_iris from sklearn.linear_model import LogisticRegression from sklearn.model_selection import cross_val_score iris = load_iris() X = iris.data y = iris.target lr = LogisticRegression() print(cross_val_score(lr, X, y)) [ 0.96078431 0.92156863 0.95833333] print(cross_val_score(lr, X, y, cv=BootstrapOutOfBag(n_splits=3, random_seed=456))) [ 0.92727273 0.96226415 0.94444444] In practice, it is recommended to run at least 200 iterations, though: print('Mean accuracy: %.1f%%' % np.mean(100*cross_val_score( lr, X, y, cv=BootstrapOutOfBag(n_splits=200, random_seed=456)))) Mean accuracy: 94.8% Using the bootstrap, we can use the percentile method to compute the confidence bounds of the performance estimate. We pick our lower and upper confidence bounds as follows: ACC_{lower} = \\alpha_1th percentile of the ACC_{boot} distribution ACC_{lower} = \\alpha_2th percentile of the ACC_{boot} distribution where \\alpha_1 = \\alpha and \\alpha_2 = 1-\\alpha , and the degree of confidence to compute the 100 \\times (1-2 \\times \\alpha) confidence interval. For instance, to compute a 95% confidence interval, we pick \\alpha=0.025 to obtain the 2.5th and 97.5th percentiles of the b bootstrap samples distribution as the upper and lower confidence bounds. import matplotlib.pyplot as plt %matplotlib inline accuracies = cross_val_score(lr, X, y, cv=BootstrapOutOfBag(n_splits=1000, random_seed=456)) mean = np.mean(accuracies) lower = np.percentile(accuracies, 2.5) upper = np.percentile(accuracies, 97.5) fig, ax = plt.subplots(figsize=(8, 4)) ax.vlines(mean, [0], 40, lw=2.5, linestyle='-', label='mean') ax.vlines(lower, [0], 15, lw=2.5, linestyle='-.', label='CI95 percentile') ax.vlines(upper, [0], 15, lw=2.5, linestyle='-.') ax.hist(accuracies, bins=11, color='#0080ff', edgecolor=\"none\", alpha=0.3) plt.legend(loc='upper left') plt.show()","title":"Example 1 -- Evaluating the predictive performance of a model"},{"location":"user_guide/evaluate/BootstrapOutOfBag/#api","text":"BootstrapOutOfBag(n_splits=200, random_seed=None) Parameters n_splits : int (default=200) Number of bootstrap iterations. Must be larger than 1. random_seed : int (default=None) If int, random_seed is the seed used by the random number generator. Returns train_idx : ndarray The training set indices for that split. test_idx : ndarray The testing set indices for that split. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/BootstrapOutOfBag/","title":"API"},{"location":"user_guide/evaluate/BootstrapOutOfBag/#methods","text":"get_n_splits(X=None, y=None, groups=None) Returns the number of splitting iterations in the cross-validator Parameters X : object Always ignored, exists for compatibility with scikit-learn. y : object Always ignored, exists for compatibility with scikit-learn. groups : object Always ignored, exists for compatibility with scikit-learn. Returns n_splits : int Returns the number of splitting iterations in the cross-validator. split(X, y=None, groups=None) y : array-like or None (default: None) Argument is not used and only included as parameter for compatibility, similar to KFold in scikit-learn. groups : array-like or None (default: None) Argument is not used and only included as parameter for compatibility, similar to KFold in scikit-learn.","title":"Methods"},{"location":"user_guide/evaluate/PredefinedHoldoutSplit/","text":"PredefinedHoldoutSplit Split a dataset into a train and validation subset for validation based on user-specified indices. from mlxtend.evaluate import PredefinedHoldoutSplit Overview The PredefinedHoldoutSplit class serves as an alternative to scikit-learn's KFold class, where the PredefinedHoldoutSplit class splits a dataset into training and a validation subsets without rotation, based on validation indices specified by the user. The PredefinedHoldoutSplit can be used as argument for cv parameters in scikit-learn's GridSearchCV etc. For performing a random split, see the related RandomHoldoutSplit class. Example 1 -- Iterating Over a PredefinedHoldoutSplit from mlxtend.evaluate import PredefinedHoldoutSplit from mlxtend.data import iris_data X, y = iris_data() h_iter = PredefinedHoldoutSplit(valid_indices=[0, 1, 99]) cnt = 0 for train_ind, valid_ind in h_iter.split(X, y): cnt += 1 print(cnt) 1 print(train_ind[:5]) print(valid_ind[:5]) [2 3 4 5 6] [ 0 1 99] Example 2 -- PredefinedHoldoutSplit in GridSearch from sklearn.model_selection import GridSearchCV from sklearn.neighbors import KNeighborsClassifier from mlxtend.evaluate import PredefinedHoldoutSplit from mlxtend.data import iris_data X, y = iris_data() params = {'n_neighbors': [1, 2, 3, 4, 5]} grid = GridSearchCV(KNeighborsClassifier(), param_grid=params, cv=PredefinedHoldoutSplit(valid_indices=[0, 1, 99])) grid.fit(X, y) assert grid.n_splits_ == 1 print(grid.grid_scores_) [mean: 1.00000, std: 0.00000, params: {'n_neighbors': 1}, mean: 1.00000, std: 0.00000, params: {'n_neighbors': 2}, mean: 1.00000, std: 0.00000, params: {'n_neighbors': 3}, mean: 1.00000, std: 0.00000, params: {'n_neighbors': 4}, mean: 1.00000, std: 0.00000, params: {'n_neighbors': 5}] /Users/sebastian/miniconda3/lib/python3.6/site-packages/sklearn/model_selection/_search.py:762: DeprecationWarning: The grid_scores_ attribute was deprecated in version 0.18 in favor of the more elaborate cv_results_ attribute. The grid_scores_ attribute will not be available from 0.20 DeprecationWarning) API PredefinedHoldoutSplit(valid_indices) Train/Validation set splitter for sklearn's GridSearchCV etc. Uses user-specified train/validation set indices to split a dataset into train/validation sets using user-defined or random indices. Parameters valid_indices : array-like, shape (num_examples,) Indices of the training examples in the training set to be used for validation. All other indices in the training set are used to for a training subset for model fitting. Methods get_n_splits(X=None, y=None, groups=None) Returns the number of splitting iterations in the cross-validator Parameters X : object Always ignored, exists for compatibility. y : object Always ignored, exists for compatibility. groups : object Always ignored, exists for compatibility. Returns n_splits : 1 Returns the number of splitting iterations in the cross-validator. Always returns 1. split(X, y, groups=None) Generate indices to split data into training and test set. Parameters X : array-like, shape (num_examples, num_features) Training data, where num_examples is the number of examples and num_features is the number of features. y : array-like, shape (num_examples,) The target variable for supervised learning problems. Stratification is done based on the y labels. groups : object Always ignored, exists for compatibility. Yields train_index : ndarray The training set indices for that split. valid_index : ndarray The validation set indices for that split.","title":"PredefinedHoldoutSplit"},{"location":"user_guide/evaluate/PredefinedHoldoutSplit/#predefinedholdoutsplit","text":"Split a dataset into a train and validation subset for validation based on user-specified indices. from mlxtend.evaluate import PredefinedHoldoutSplit","title":"PredefinedHoldoutSplit"},{"location":"user_guide/evaluate/PredefinedHoldoutSplit/#overview","text":"The PredefinedHoldoutSplit class serves as an alternative to scikit-learn's KFold class, where the PredefinedHoldoutSplit class splits a dataset into training and a validation subsets without rotation, based on validation indices specified by the user. The PredefinedHoldoutSplit can be used as argument for cv parameters in scikit-learn's GridSearchCV etc. For performing a random split, see the related RandomHoldoutSplit class.","title":"Overview"},{"location":"user_guide/evaluate/PredefinedHoldoutSplit/#example-1-iterating-over-a-predefinedholdoutsplit","text":"from mlxtend.evaluate import PredefinedHoldoutSplit from mlxtend.data import iris_data X, y = iris_data() h_iter = PredefinedHoldoutSplit(valid_indices=[0, 1, 99]) cnt = 0 for train_ind, valid_ind in h_iter.split(X, y): cnt += 1 print(cnt) 1 print(train_ind[:5]) print(valid_ind[:5]) [2 3 4 5 6] [ 0 1 99]","title":"Example 1 -- Iterating Over a PredefinedHoldoutSplit"},{"location":"user_guide/evaluate/PredefinedHoldoutSplit/#example-2-predefinedholdoutsplit-in-gridsearch","text":"from sklearn.model_selection import GridSearchCV from sklearn.neighbors import KNeighborsClassifier from mlxtend.evaluate import PredefinedHoldoutSplit from mlxtend.data import iris_data X, y = iris_data() params = {'n_neighbors': [1, 2, 3, 4, 5]} grid = GridSearchCV(KNeighborsClassifier(), param_grid=params, cv=PredefinedHoldoutSplit(valid_indices=[0, 1, 99])) grid.fit(X, y) assert grid.n_splits_ == 1 print(grid.grid_scores_) [mean: 1.00000, std: 0.00000, params: {'n_neighbors': 1}, mean: 1.00000, std: 0.00000, params: {'n_neighbors': 2}, mean: 1.00000, std: 0.00000, params: {'n_neighbors': 3}, mean: 1.00000, std: 0.00000, params: {'n_neighbors': 4}, mean: 1.00000, std: 0.00000, params: {'n_neighbors': 5}] /Users/sebastian/miniconda3/lib/python3.6/site-packages/sklearn/model_selection/_search.py:762: DeprecationWarning: The grid_scores_ attribute was deprecated in version 0.18 in favor of the more elaborate cv_results_ attribute. The grid_scores_ attribute will not be available from 0.20 DeprecationWarning)","title":"Example 2 -- PredefinedHoldoutSplit in GridSearch"},{"location":"user_guide/evaluate/PredefinedHoldoutSplit/#api","text":"PredefinedHoldoutSplit(valid_indices) Train/Validation set splitter for sklearn's GridSearchCV etc. Uses user-specified train/validation set indices to split a dataset into train/validation sets using user-defined or random indices. Parameters valid_indices : array-like, shape (num_examples,) Indices of the training examples in the training set to be used for validation. All other indices in the training set are used to for a training subset for model fitting.","title":"API"},{"location":"user_guide/evaluate/PredefinedHoldoutSplit/#methods","text":"get_n_splits(X=None, y=None, groups=None) Returns the number of splitting iterations in the cross-validator Parameters X : object Always ignored, exists for compatibility. y : object Always ignored, exists for compatibility. groups : object Always ignored, exists for compatibility. Returns n_splits : 1 Returns the number of splitting iterations in the cross-validator. Always returns 1. split(X, y, groups=None) Generate indices to split data into training and test set. Parameters X : array-like, shape (num_examples, num_features) Training data, where num_examples is the number of examples and num_features is the number of features. y : array-like, shape (num_examples,) The target variable for supervised learning problems. Stratification is done based on the y labels. groups : object Always ignored, exists for compatibility. Yields train_index : ndarray The training set indices for that split. valid_index : ndarray The validation set indices for that split.","title":"Methods"},{"location":"user_guide/evaluate/RandomHoldoutSplit/","text":"RandomHoldoutSplit Randomly split a dataset into a train and validation subset for validation. from mlxtend.evaluate import RandomHoldoutSplit Overview The RandomHoldoutSplit class serves as an alternative to scikit-learn's KFold class, where the RandomHoldoutSplit class splits a dataset into training and a validation subsets without rotation. The RandomHoldoutSplit can be used as argument for cv parameters in scikit-learn's GridSearchCV etc. The term \"random\" in RandomHoldoutSplit comes from the fact that the split is specified by the random_seed rather than specifying the training and validation set indices manually as in the PredefinedHoldoutSplit class in mlxtend. Example 1 -- Iterating Over a RandomHoldoutSplit from mlxtend.evaluate import RandomHoldoutSplit from mlxtend.data import iris_data X, y = iris_data() h_iter = RandomHoldoutSplit(valid_size=0.3, random_seed=123) cnt = 0 for train_ind, valid_ind in h_iter.split(X, y): cnt += 1 print(cnt) 1 print(train_ind[:5]) print(valid_ind[:5]) [ 60 16 88 130 6] [ 72 125 80 86 117] Example 2 -- RandomHoldoutSplit in GridSearch from sklearn.model_selection import GridSearchCV from sklearn.neighbors import KNeighborsClassifier from mlxtend.evaluate import RandomHoldoutSplit from mlxtend.data import iris_data X, y = iris_data() params = {'n_neighbors': [1, 2, 3, 4, 5]} grid = GridSearchCV(KNeighborsClassifier(), param_grid=params, cv=RandomHoldoutSplit(valid_size=0.3, random_seed=123)) grid.fit(X, y) assert grid.n_splits_ == 1 print(grid.grid_scores_) [mean: 0.95556, std: 0.00000, params: {'n_neighbors': 1}, mean: 0.95556, std: 0.00000, params: {'n_neighbors': 2}, mean: 0.95556, std: 0.00000, params: {'n_neighbors': 3}, mean: 0.95556, std: 0.00000, params: {'n_neighbors': 4}, mean: 0.95556, std: 0.00000, params: {'n_neighbors': 5}] /Users/sebastian/miniconda3/lib/python3.6/site-packages/sklearn/model_selection/_search.py:762: DeprecationWarning: The grid_scores_ attribute was deprecated in version 0.18 in favor of the more elaborate cv_results_ attribute. The grid_scores_ attribute will not be available from 0.20 DeprecationWarning) API RandomHoldoutSplit(valid_size=0.5, random_seed=None, stratify=False) Train/Validation set splitter for sklearn's GridSearchCV etc. Provides train/validation set indices to split a dataset into train/validation sets using random indices. Parameters valid_size : float (default: 0.5) Proportion of examples that being assigned as validation examples. 1- valid_size will then automatically be assigned as training set examples. random_seed : int (default: None) The random seed for splitting the data into training and validation set partitions. stratify : bool (default: False) True or False, whether to perform a stratified split or not Methods get_n_splits(X=None, y=None, groups=None) Returns the number of splitting iterations in the cross-validator Parameters X : object Always ignored, exists for compatibility. y : object Always ignored, exists for compatibility. groups : object Always ignored, exists for compatibility. Returns n_splits : 1 Returns the number of splitting iterations in the cross-validator. Always returns 1. split(X, y, groups=None) Generate indices to split data into training and test set. Parameters X : array-like, shape (num_examples, num_features) Training data, where num_examples is the number of training examples and num_features is the number of features. y : array-like, shape (num_examples,) The target variable for supervised learning problems. Stratification is done based on the y labels. groups : object Always ignored, exists for compatibility. Yields train_index : ndarray The training set indices for that split. valid_index : ndarray The validation set indices for that split.","title":"RandomHoldoutSplit"},{"location":"user_guide/evaluate/RandomHoldoutSplit/#randomholdoutsplit","text":"Randomly split a dataset into a train and validation subset for validation. from mlxtend.evaluate import RandomHoldoutSplit","title":"RandomHoldoutSplit"},{"location":"user_guide/evaluate/RandomHoldoutSplit/#overview","text":"The RandomHoldoutSplit class serves as an alternative to scikit-learn's KFold class, where the RandomHoldoutSplit class splits a dataset into training and a validation subsets without rotation. The RandomHoldoutSplit can be used as argument for cv parameters in scikit-learn's GridSearchCV etc. The term \"random\" in RandomHoldoutSplit comes from the fact that the split is specified by the random_seed rather than specifying the training and validation set indices manually as in the PredefinedHoldoutSplit class in mlxtend.","title":"Overview"},{"location":"user_guide/evaluate/RandomHoldoutSplit/#example-1-iterating-over-a-randomholdoutsplit","text":"from mlxtend.evaluate import RandomHoldoutSplit from mlxtend.data import iris_data X, y = iris_data() h_iter = RandomHoldoutSplit(valid_size=0.3, random_seed=123) cnt = 0 for train_ind, valid_ind in h_iter.split(X, y): cnt += 1 print(cnt) 1 print(train_ind[:5]) print(valid_ind[:5]) [ 60 16 88 130 6] [ 72 125 80 86 117]","title":"Example 1 -- Iterating Over a RandomHoldoutSplit"},{"location":"user_guide/evaluate/RandomHoldoutSplit/#example-2-randomholdoutsplit-in-gridsearch","text":"from sklearn.model_selection import GridSearchCV from sklearn.neighbors import KNeighborsClassifier from mlxtend.evaluate import RandomHoldoutSplit from mlxtend.data import iris_data X, y = iris_data() params = {'n_neighbors': [1, 2, 3, 4, 5]} grid = GridSearchCV(KNeighborsClassifier(), param_grid=params, cv=RandomHoldoutSplit(valid_size=0.3, random_seed=123)) grid.fit(X, y) assert grid.n_splits_ == 1 print(grid.grid_scores_) [mean: 0.95556, std: 0.00000, params: {'n_neighbors': 1}, mean: 0.95556, std: 0.00000, params: {'n_neighbors': 2}, mean: 0.95556, std: 0.00000, params: {'n_neighbors': 3}, mean: 0.95556, std: 0.00000, params: {'n_neighbors': 4}, mean: 0.95556, std: 0.00000, params: {'n_neighbors': 5}] /Users/sebastian/miniconda3/lib/python3.6/site-packages/sklearn/model_selection/_search.py:762: DeprecationWarning: The grid_scores_ attribute was deprecated in version 0.18 in favor of the more elaborate cv_results_ attribute. The grid_scores_ attribute will not be available from 0.20 DeprecationWarning)","title":"Example 2 -- RandomHoldoutSplit in GridSearch"},{"location":"user_guide/evaluate/RandomHoldoutSplit/#api","text":"RandomHoldoutSplit(valid_size=0.5, random_seed=None, stratify=False) Train/Validation set splitter for sklearn's GridSearchCV etc. Provides train/validation set indices to split a dataset into train/validation sets using random indices. Parameters valid_size : float (default: 0.5) Proportion of examples that being assigned as validation examples. 1- valid_size will then automatically be assigned as training set examples. random_seed : int (default: None) The random seed for splitting the data into training and validation set partitions. stratify : bool (default: False) True or False, whether to perform a stratified split or not","title":"API"},{"location":"user_guide/evaluate/RandomHoldoutSplit/#methods","text":"get_n_splits(X=None, y=None, groups=None) Returns the number of splitting iterations in the cross-validator Parameters X : object Always ignored, exists for compatibility. y : object Always ignored, exists for compatibility. groups : object Always ignored, exists for compatibility. Returns n_splits : 1 Returns the number of splitting iterations in the cross-validator. Always returns 1. split(X, y, groups=None) Generate indices to split data into training and test set. Parameters X : array-like, shape (num_examples, num_features) Training data, where num_examples is the number of training examples and num_features is the number of features. y : array-like, shape (num_examples,) The target variable for supervised learning problems. Stratification is done based on the y labels. groups : object Always ignored, exists for compatibility. Yields train_index : ndarray The training set indices for that split. valid_index : ndarray The validation set indices for that split.","title":"Methods"},{"location":"user_guide/evaluate/bootstrap/","text":"Bootstrap An implementation of the ordinary nonparametric bootstrap to bootstrap a single statistic (for example, the mean. median, R^2 of a regression fit, and so forth). from mlxtend.evaluate import bootstrap Overview The bootstrap offers an easy and effective way to estimate the distribution of a statistic via simulation, by drawing (or generating) new samples from an existing sample with replacement. Note that the bootstrap does not require making any assumptions about the sample statistic or dataset being normally distributed. Using the bootstrap, we can estimate sample statistics and compute the standard error of the mean and confidence intervals as if we have drawn a number of samples from an infinite population. In a nutshell, the bootstrap procedure can be described as follows: Draw a sample with replacement Compute the sample statistic Repeat step 1-2 n times Compute the standard deviation (standard error of the mean of the statistic) Compute the confidence interval Or, in simple terms, we can interpret the bootstrap a means of drawing a potentially endless number of (new) samples from a population by resampling the original dataset. Note that the term \"bootstrap replicate\" is being used quite loosely in current literature; many researchers and practitioners use it to define the number of bootstrap samples we draw from the original dataset. However, in the context of this documentation and the code annotation, we use the original definition of bootstrap repliactes and use it to refer to the statistic computed from a bootstrap sample. References [1] Efron, Bradley, and Robert J. Tibshirani. An introduction to the bootstrap. CRC press, 1994. Management of Data (ACM SIGMOD '97), pages 265-276, 1997. Example 1 -- Bootstrapping the Mean This simple example illustrates how you could bootstrap the mean of a sample. import numpy as np from mlxtend.evaluate import bootstrap rng = np.random.RandomState(123) x = rng.normal(loc=5., size=100) original, std_err, ci_bounds = bootstrap(x, num_rounds=1000, func=np.mean, ci=0.95, seed=123) print('Mean: %.2f, SE: +/- %.2f, CI95: [%.2f, %.2f]' % (original, std_err, ci_bounds[0], ci_bounds[1])) Mean: 5.03, SE: +/- 0.11, CI95: [4.80, 5.26] Example 2 - Bootstrapping a Regression Fit This example illustrates how you can bootstrap the R^2 of a regression fit on the training data. from mlxtend.data import autompg_data from sklearn.linear_model import LinearRegression from sklearn.metrics import r2_score X, y = autompg_data() lr = LinearRegression() def r2_fit(X, model=lr): x, y = X[:, 0].reshape(-1, 1), X[:, 1] pred = lr.fit(x, y).predict(x) return r2_score(y, pred) original, std_err, ci_bounds = bootstrap(X, num_rounds=1000, func=r2_fit, ci=0.95, seed=123) print('Mean: %.2f, SE: +/- %.2f, CI95: [%.2f, %.2f]' % (original, std_err, ci_bounds[0], ci_bounds[1])) Mean: 0.90, SE: +/- 0.01, CI95: [0.89, 0.92] API bootstrap(x, func, num_rounds=1000, ci=0.95, ddof=1, seed=None) Implements the ordinary nonparametric bootstrap Parameters x : NumPy array, shape=(n_samples, [n_columns]) An one or multidimensional array of data records func : A function which computes a statistic that is used to compute the bootstrap replicates (the statistic computed from the bootstrap samples). This function must return a scalar value. For example, np.mean or np.median would be an acceptable argument for func if x is a 1-dimensional array or vector. num_rounds : int (default=1000) The number of bootstrap samnples to draw where each bootstrap sample has the same number of records as the original dataset. ci : int (default=0.95) An integer in the range (0, 1) that represents the confidence level for computing the confidence interval. For example, ci=0.95 (default) will compute the 95% confidence interval from the bootstrap replicates. ddof : int The delta degrees of freedom used when computing the standard error. seed : int or None (default=None) Random seed for generating bootstrap samples. Returns original, standard_error, (lower_ci, upper_ci) : tuple Returns the statistic of the original sample ( original ), the standard error of the estimate, and the respective confidence interval bounds. Examples >>> from mlxtend.evaluate import bootstrap >>> rng = np.random.RandomState(123) >>> x = rng.normal(loc=5., size=100) >>> original, std_err, ci_bounds = bootstrap(x, ... num_rounds=1000, ... func=np.mean, ... ci=0.95, ... seed=123) >>> print('Mean: %.2f, SE: +/- %.2f, CI95: [%.2f, %.2f]' % (original, ... std_err, ... ci_bounds[0], ... ci_bounds[1])) Mean: 5.03, SE: +/- 0.11, CI95: [4.80, 5.26] >>> For more usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/bootstrap/","title":"Bootstrap"},{"location":"user_guide/evaluate/bootstrap/#bootstrap","text":"An implementation of the ordinary nonparametric bootstrap to bootstrap a single statistic (for example, the mean. median, R^2 of a regression fit, and so forth). from mlxtend.evaluate import bootstrap","title":"Bootstrap"},{"location":"user_guide/evaluate/bootstrap/#overview","text":"The bootstrap offers an easy and effective way to estimate the distribution of a statistic via simulation, by drawing (or generating) new samples from an existing sample with replacement. Note that the bootstrap does not require making any assumptions about the sample statistic or dataset being normally distributed. Using the bootstrap, we can estimate sample statistics and compute the standard error of the mean and confidence intervals as if we have drawn a number of samples from an infinite population. In a nutshell, the bootstrap procedure can be described as follows: Draw a sample with replacement Compute the sample statistic Repeat step 1-2 n times Compute the standard deviation (standard error of the mean of the statistic) Compute the confidence interval Or, in simple terms, we can interpret the bootstrap a means of drawing a potentially endless number of (new) samples from a population by resampling the original dataset. Note that the term \"bootstrap replicate\" is being used quite loosely in current literature; many researchers and practitioners use it to define the number of bootstrap samples we draw from the original dataset. However, in the context of this documentation and the code annotation, we use the original definition of bootstrap repliactes and use it to refer to the statistic computed from a bootstrap sample.","title":"Overview"},{"location":"user_guide/evaluate/bootstrap/#references","text":"[1] Efron, Bradley, and Robert J. Tibshirani. An introduction to the bootstrap. CRC press, 1994. Management of Data (ACM SIGMOD '97), pages 265-276, 1997.","title":"References"},{"location":"user_guide/evaluate/bootstrap/#example-1-bootstrapping-the-mean","text":"This simple example illustrates how you could bootstrap the mean of a sample. import numpy as np from mlxtend.evaluate import bootstrap rng = np.random.RandomState(123) x = rng.normal(loc=5., size=100) original, std_err, ci_bounds = bootstrap(x, num_rounds=1000, func=np.mean, ci=0.95, seed=123) print('Mean: %.2f, SE: +/- %.2f, CI95: [%.2f, %.2f]' % (original, std_err, ci_bounds[0], ci_bounds[1])) Mean: 5.03, SE: +/- 0.11, CI95: [4.80, 5.26]","title":"Example 1 -- Bootstrapping the Mean"},{"location":"user_guide/evaluate/bootstrap/#example-2-bootstrapping-a-regression-fit","text":"This example illustrates how you can bootstrap the R^2 of a regression fit on the training data. from mlxtend.data import autompg_data from sklearn.linear_model import LinearRegression from sklearn.metrics import r2_score X, y = autompg_data() lr = LinearRegression() def r2_fit(X, model=lr): x, y = X[:, 0].reshape(-1, 1), X[:, 1] pred = lr.fit(x, y).predict(x) return r2_score(y, pred) original, std_err, ci_bounds = bootstrap(X, num_rounds=1000, func=r2_fit, ci=0.95, seed=123) print('Mean: %.2f, SE: +/- %.2f, CI95: [%.2f, %.2f]' % (original, std_err, ci_bounds[0], ci_bounds[1])) Mean: 0.90, SE: +/- 0.01, CI95: [0.89, 0.92]","title":"Example 2 - Bootstrapping a Regression Fit"},{"location":"user_guide/evaluate/bootstrap/#api","text":"bootstrap(x, func, num_rounds=1000, ci=0.95, ddof=1, seed=None) Implements the ordinary nonparametric bootstrap Parameters x : NumPy array, shape=(n_samples, [n_columns]) An one or multidimensional array of data records func : A function which computes a statistic that is used to compute the bootstrap replicates (the statistic computed from the bootstrap samples). This function must return a scalar value. For example, np.mean or np.median would be an acceptable argument for func if x is a 1-dimensional array or vector. num_rounds : int (default=1000) The number of bootstrap samnples to draw where each bootstrap sample has the same number of records as the original dataset. ci : int (default=0.95) An integer in the range (0, 1) that represents the confidence level for computing the confidence interval. For example, ci=0.95 (default) will compute the 95% confidence interval from the bootstrap replicates. ddof : int The delta degrees of freedom used when computing the standard error. seed : int or None (default=None) Random seed for generating bootstrap samples. Returns original, standard_error, (lower_ci, upper_ci) : tuple Returns the statistic of the original sample ( original ), the standard error of the estimate, and the respective confidence interval bounds. Examples >>> from mlxtend.evaluate import bootstrap >>> rng = np.random.RandomState(123) >>> x = rng.normal(loc=5., size=100) >>> original, std_err, ci_bounds = bootstrap(x, ... num_rounds=1000, ... func=np.mean, ... ci=0.95, ... seed=123) >>> print('Mean: %.2f, SE: +/- %.2f, CI95: [%.2f, %.2f]' % (original, ... std_err, ... ci_bounds[0], ... ci_bounds[1])) Mean: 5.03, SE: +/- 0.11, CI95: [4.80, 5.26] >>> For more usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/bootstrap/","title":"API"},{"location":"user_guide/evaluate/bootstrap_point632_score/","text":"bootstrap_point632_score An implementation of the .632 bootstrap to evaluate supervised learning algorithms. from mlxtend.evaluate import bootstrap_point632_score Overview Originally, the bootstrap method aims to determine the statistical properties of an estimator when the underlying distribution was unknown and additional samples are not available. Now, in order to exploit this method for the evaluation of predictive models, such as hypotheses for classification and regression, we may prefer a slightly different approach to bootstrapping using the so-called Out-Of-Bag (OOB) or Leave-One-Out Bootstrap (LOOB) technique. Here, we use out-of-bag samples as test sets for evaluation instead of evaluating the model on the training data. Out-of-bag samples are the unique sets of instances that are not used for model fitting as shown in the figure below [1]. The figure above illustrates how three random bootstrap samples drawn from an exemplary ten-sample dataset ( X_1,X_2, ..., X_{10} ) and their out-of-bag sample for testing may look like. In practice, Bradley Efron and Robert Tibshirani recommend drawing 50 to 200 bootstrap samples as being sufficient for reliable estimates [2]. .632 Bootstrap In 1983, Bradley Efron described the .632 Estimate , a further improvement to address the pessimistic bias of the bootstrap cross-validation approach described above [3]. The pessimistic bias in the \"classic\" bootstrap method can be attributed to the fact that the bootstrap samples only contain approximately 63.2% of the unique samples from the original dataset. For instance, we can compute the probability that a given sample from a dataset of size n is not drawn as a bootstrap sample as P (\\text{not chosen}) = \\bigg(1 - \\frac{1}{n}\\bigg)^n, which is asymptotically equivalent to \\frac{1}{e} \\approx 0.368 as n \\rightarrow \\infty. Vice versa, we can then compute the probability that a sample is chosen as P (\\text{chosen}) = 1 - \\bigg(1 - \\frac{1}{n}\\bigg)^n \\approx 0.632 for reasonably large datasets, so that we'd select approximately 0.632 \\times n uniques samples as bootstrap training sets and reserve 0.368 \\times n out-of-bag samples for testing in each iteration. Now, to address the bias that is due to this the sampling with replacement, Bradley Efron proposed the .632 Estimate that we mentioned earlier, which is computed via the following equation: \\text{ACC}_{boot} = \\frac{1}{b} \\sum_{i=1}^b \\big(0.632 \\cdot \\text{ACC}_{h, i} + 0.368 \\cdot \\text{ACC}_{r, i}\\big), where \\text{ACC}_{r, i} is the resubstitution accuracy, and \\text{ACC}_{h, i} is the accuracy on the out-of-bag sample. .632+ Bootstrap Now, while the .632 Boostrap attempts to address the pessimistic bias of the estimate, an optimistic bias may occur with models that tend to overfit so that Bradley Efron and Robert Tibshirani proposed the The .632+ Bootstrap Method (Efron and Tibshirani, 1997). Instead of using a fixed \"weight\" \\omega = 0.632 in ACC_{\\text{boot}} = \\frac{1}{b} \\sum_{i=1}^b \\big(\\omega \\cdot \\text{ACC}_{h, i} + (1-\\omega) \\cdot \\text{ACC}_{r, i} \\big), we compute the weight \\gamma as \\omega = \\frac{0.632}{1 - 0.368 \\times R}, where R is the relative overfitting rate R = \\frac{(-1) \\times (\\text{ACC}_{h, i} - \\text{ACC}_{r, i})}{\\gamma - (1 -\\text{ACC}_{h, i})}. (Since we are plugging \\omega into the equation for computing ACC_{boot} that we defined above, \\text{ACC}_{h, i} and \\text{ACC}_{r, i} still refer to the resubstitution and out-of-bag accuracy estimates in the i th bootstrap round, respectively.) Further, we need to determine the no-information rate \\gamma in order to compute R . For instance, we can compute \\gamma by fitting a model to a dataset that contains all possible combinations between samples x_{i'} and target class labels y_{i} \u2014 we pretend that the observations and class labels are independent: \\gamma = \\frac{1}{n^2} \\sum_{i=1}^{n} \\sum_{i '=1}^{n} L(y_{i}, f(x_{i '})). Alternatively, we can estimate the no-information rate \\gamma as follows: \\gamma = \\sum_{k=1}^K p_k (1 - q_k), where p_k is the proportion of class k samples observed in the dataset, and q_k is the proportion of class k samples that the classifier predicts in the dataset. References [1] https://sebastianraschka.com/blog/2016/model-evaluation-selection-part2.html [2] Efron, Bradley, and Robert J. Tibshirani. An introduction to the bootstrap. CRC press, 1994. Management of Data (ACM SIGMOD '97), pages 265-276, 1997. [3] Efron, Bradley. 1983. \u201cEstimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation.\u201d Journal of the American Statistical Association 78 (382): 316. doi:10.2307/2288636. [4] Efron, Bradley, and Robert Tibshirani. 1997. \u201cImprovements on Cross-Validation: The .632+ Bootstrap Method.\u201d Journal of the American Statistical Association 92 (438): 548. doi:10.2307/2965703. Example 1 -- Evaluating the predictive performance of a model via the classic out-of-bag Bootstrap The bootstrap_point632_score function mimics the behavior of scikit-learn's `cross_val_score, and a typically usage example is shown below: from sklearn import datasets from sklearn.tree import DecisionTreeClassifier from mlxtend.evaluate import bootstrap_point632_score import numpy as np iris = datasets.load_iris() X = iris.data y = iris.target tree = DecisionTreeClassifier(random_state=0) # Model accuracy scores = bootstrap_point632_score(tree, X, y, method='oob') acc = np.mean(scores) print('Accuracy: %.2f%%' % (100*acc)) # Confidence interval lower = np.percentile(scores, 2.5) upper = np.percentile(scores, 97.5) print('95%% Confidence interval: [%.2f, %.2f]' % (100*lower, 100*upper)) Accuracy: 94.52% 95% Confidence interval: [88.88, 98.28] Example 2 -- Evaluating the predictive performance of a model via the .632 Bootstrap from sklearn import datasets from sklearn.tree import DecisionTreeClassifier from mlxtend.evaluate import bootstrap_point632_score import numpy as np iris = datasets.load_iris() X = iris.data y = iris.target tree = DecisionTreeClassifier(random_state=0) # Model accuracy scores = bootstrap_point632_score(tree, X, y) acc = np.mean(scores) print('Accuracy: %.2f%%' % (100*acc)) # Confidence interval lower = np.percentile(scores, 2.5) upper = np.percentile(scores, 97.5) print('95%% Confidence interval: [%.2f, %.2f]' % (100*lower, 100*upper)) Accuracy: 96.58% 95% Confidence interval: [92.37, 98.97] Example 3 -- Evaluating the predictive performance of a model via the .632+ Bootstrap from sklearn import datasets from sklearn.tree import DecisionTreeClassifier from mlxtend.evaluate import bootstrap_point632_score import numpy as np iris = datasets.load_iris() X = iris.data y = iris.target tree = DecisionTreeClassifier(random_state=0) # Model accuracy scores = bootstrap_point632_score(tree, X, y, method='.632+') acc = np.mean(scores) print('Accuracy: %.2f%%' % (100*acc)) # Confidence interval lower = np.percentile(scores, 2.5) upper = np.percentile(scores, 97.5) print('95%% Confidence interval: [%.2f, %.2f]' % (100*lower, 100*upper)) Accuracy: 96.40% 95% Confidence interval: [92.34, 99.00] API bootstrap_point632_score(estimator, X, y, n_splits=200, method='.632', scoring_func=None, random_seed=None, clone_estimator=True) Implementation of the .632 [1] and .632+ [2] bootstrap for supervised learning References: [1] Efron, Bradley. 1983. \u201cEstimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation.\u201d Journal of the American Statistical Association 78 (382): 316. doi:10.2307/2288636. [2] Efron, Bradley, and Robert Tibshirani. 1997. \u201cImprovements on Cross-Validation: The .632+ Bootstrap Method.\u201d Journal of the American Statistical Association 92 (438): 548. doi:10.2307/2965703. Parameters estimator : object An estimator for classification or regression that follows the scikit-learn API and implements \"fit\" and \"predict\" methods. X : array-like The data to fit. Can be, for example a list, or an array at least 2d. y : array-like, optional, default: None The target variable to try to predict in the case of supervised learning. n_splits : int (default=200) Number of bootstrap iterations. Must be larger than 1. method : str (default='.632') The bootstrap method, which can be either - 1) '.632' bootstrap (default) - 2) '.632+' bootstrap - 3) 'oob' (regular out-of-bag, no weighting) for comparison studies. scoring_func : callable, Score function (or loss function) with signature scoring_func(y, y_pred, **kwargs) . If none, uses classification accuracy if the estimator is a classifier and mean squared error if the estimator is a regressor. random_seed : int (default=None) If int, random_seed is the seed used by the random number generator. clone_estimator : bool (default=True) Clones the estimator if true, otherwise fits the original. Returns scores : array of float, shape=(len(list(n_splits)),) Array of scores of the estimator for each bootstrap replicate. Examples >>> from sklearn import datasets, linear_model >>> from mlxtend.evaluate import bootstrap_point632_score >>> iris = datasets.load_iris() >>> X = iris.data >>> y = iris.target >>> lr = linear_model.LogisticRegression() >>> scores = bootstrap_point632_score(lr, X, y) >>> acc = np.mean(scores) >>> print('Accuracy:', acc) 0.953023146884 >>> lower = np.percentile(scores, 2.5) >>> upper = np.percentile(scores, 97.5) >>> print('95%% Confidence interval: [%.2f, %.2f]' % (lower, upper)) 95% Confidence interval: [0.90, 0.98] For more usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/bootstrap_point632_score/","title":"bootstrap_point632_score"},{"location":"user_guide/evaluate/bootstrap_point632_score/#bootstrap_point632_score","text":"An implementation of the .632 bootstrap to evaluate supervised learning algorithms. from mlxtend.evaluate import bootstrap_point632_score","title":"bootstrap_point632_score"},{"location":"user_guide/evaluate/bootstrap_point632_score/#overview","text":"Originally, the bootstrap method aims to determine the statistical properties of an estimator when the underlying distribution was unknown and additional samples are not available. Now, in order to exploit this method for the evaluation of predictive models, such as hypotheses for classification and regression, we may prefer a slightly different approach to bootstrapping using the so-called Out-Of-Bag (OOB) or Leave-One-Out Bootstrap (LOOB) technique. Here, we use out-of-bag samples as test sets for evaluation instead of evaluating the model on the training data. Out-of-bag samples are the unique sets of instances that are not used for model fitting as shown in the figure below [1]. The figure above illustrates how three random bootstrap samples drawn from an exemplary ten-sample dataset ( X_1,X_2, ..., X_{10} ) and their out-of-bag sample for testing may look like. In practice, Bradley Efron and Robert Tibshirani recommend drawing 50 to 200 bootstrap samples as being sufficient for reliable estimates [2].","title":"Overview"},{"location":"user_guide/evaluate/bootstrap_point632_score/#632-bootstrap","text":"In 1983, Bradley Efron described the .632 Estimate , a further improvement to address the pessimistic bias of the bootstrap cross-validation approach described above [3]. The pessimistic bias in the \"classic\" bootstrap method can be attributed to the fact that the bootstrap samples only contain approximately 63.2% of the unique samples from the original dataset. For instance, we can compute the probability that a given sample from a dataset of size n is not drawn as a bootstrap sample as P (\\text{not chosen}) = \\bigg(1 - \\frac{1}{n}\\bigg)^n, which is asymptotically equivalent to \\frac{1}{e} \\approx 0.368 as n \\rightarrow \\infty. Vice versa, we can then compute the probability that a sample is chosen as P (\\text{chosen}) = 1 - \\bigg(1 - \\frac{1}{n}\\bigg)^n \\approx 0.632 for reasonably large datasets, so that we'd select approximately 0.632 \\times n uniques samples as bootstrap training sets and reserve 0.368 \\times n out-of-bag samples for testing in each iteration. Now, to address the bias that is due to this the sampling with replacement, Bradley Efron proposed the .632 Estimate that we mentioned earlier, which is computed via the following equation: \\text{ACC}_{boot} = \\frac{1}{b} \\sum_{i=1}^b \\big(0.632 \\cdot \\text{ACC}_{h, i} + 0.368 \\cdot \\text{ACC}_{r, i}\\big), where \\text{ACC}_{r, i} is the resubstitution accuracy, and \\text{ACC}_{h, i} is the accuracy on the out-of-bag sample.","title":".632 Bootstrap"},{"location":"user_guide/evaluate/bootstrap_point632_score/#632-bootstrap_1","text":"Now, while the .632 Boostrap attempts to address the pessimistic bias of the estimate, an optimistic bias may occur with models that tend to overfit so that Bradley Efron and Robert Tibshirani proposed the The .632+ Bootstrap Method (Efron and Tibshirani, 1997). Instead of using a fixed \"weight\" \\omega = 0.632 in ACC_{\\text{boot}} = \\frac{1}{b} \\sum_{i=1}^b \\big(\\omega \\cdot \\text{ACC}_{h, i} + (1-\\omega) \\cdot \\text{ACC}_{r, i} \\big), we compute the weight \\gamma as \\omega = \\frac{0.632}{1 - 0.368 \\times R}, where R is the relative overfitting rate R = \\frac{(-1) \\times (\\text{ACC}_{h, i} - \\text{ACC}_{r, i})}{\\gamma - (1 -\\text{ACC}_{h, i})}. (Since we are plugging \\omega into the equation for computing ACC_{boot} that we defined above, \\text{ACC}_{h, i} and \\text{ACC}_{r, i} still refer to the resubstitution and out-of-bag accuracy estimates in the i th bootstrap round, respectively.) Further, we need to determine the no-information rate \\gamma in order to compute R . For instance, we can compute \\gamma by fitting a model to a dataset that contains all possible combinations between samples x_{i'} and target class labels y_{i} \u2014 we pretend that the observations and class labels are independent: \\gamma = \\frac{1}{n^2} \\sum_{i=1}^{n} \\sum_{i '=1}^{n} L(y_{i}, f(x_{i '})). Alternatively, we can estimate the no-information rate \\gamma as follows: \\gamma = \\sum_{k=1}^K p_k (1 - q_k), where p_k is the proportion of class k samples observed in the dataset, and q_k is the proportion of class k samples that the classifier predicts in the dataset.","title":".632+ Bootstrap"},{"location":"user_guide/evaluate/bootstrap_point632_score/#references","text":"[1] https://sebastianraschka.com/blog/2016/model-evaluation-selection-part2.html [2] Efron, Bradley, and Robert J. Tibshirani. An introduction to the bootstrap. CRC press, 1994. Management of Data (ACM SIGMOD '97), pages 265-276, 1997. [3] Efron, Bradley. 1983. \u201cEstimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation.\u201d Journal of the American Statistical Association 78 (382): 316. doi:10.2307/2288636. [4] Efron, Bradley, and Robert Tibshirani. 1997. \u201cImprovements on Cross-Validation: The .632+ Bootstrap Method.\u201d Journal of the American Statistical Association 92 (438): 548. doi:10.2307/2965703.","title":"References"},{"location":"user_guide/evaluate/bootstrap_point632_score/#example-1-evaluating-the-predictive-performance-of-a-model-via-the-classic-out-of-bag-bootstrap","text":"The bootstrap_point632_score function mimics the behavior of scikit-learn's `cross_val_score, and a typically usage example is shown below: from sklearn import datasets from sklearn.tree import DecisionTreeClassifier from mlxtend.evaluate import bootstrap_point632_score import numpy as np iris = datasets.load_iris() X = iris.data y = iris.target tree = DecisionTreeClassifier(random_state=0) # Model accuracy scores = bootstrap_point632_score(tree, X, y, method='oob') acc = np.mean(scores) print('Accuracy: %.2f%%' % (100*acc)) # Confidence interval lower = np.percentile(scores, 2.5) upper = np.percentile(scores, 97.5) print('95%% Confidence interval: [%.2f, %.2f]' % (100*lower, 100*upper)) Accuracy: 94.52% 95% Confidence interval: [88.88, 98.28]","title":"Example 1 -- Evaluating the predictive performance of a model via the classic out-of-bag Bootstrap"},{"location":"user_guide/evaluate/bootstrap_point632_score/#example-2-evaluating-the-predictive-performance-of-a-model-via-the-632-bootstrap","text":"from sklearn import datasets from sklearn.tree import DecisionTreeClassifier from mlxtend.evaluate import bootstrap_point632_score import numpy as np iris = datasets.load_iris() X = iris.data y = iris.target tree = DecisionTreeClassifier(random_state=0) # Model accuracy scores = bootstrap_point632_score(tree, X, y) acc = np.mean(scores) print('Accuracy: %.2f%%' % (100*acc)) # Confidence interval lower = np.percentile(scores, 2.5) upper = np.percentile(scores, 97.5) print('95%% Confidence interval: [%.2f, %.2f]' % (100*lower, 100*upper)) Accuracy: 96.58% 95% Confidence interval: [92.37, 98.97]","title":"Example 2 -- Evaluating the predictive performance of a model via the .632 Bootstrap"},{"location":"user_guide/evaluate/bootstrap_point632_score/#example-3-evaluating-the-predictive-performance-of-a-model-via-the-632-bootstrap","text":"from sklearn import datasets from sklearn.tree import DecisionTreeClassifier from mlxtend.evaluate import bootstrap_point632_score import numpy as np iris = datasets.load_iris() X = iris.data y = iris.target tree = DecisionTreeClassifier(random_state=0) # Model accuracy scores = bootstrap_point632_score(tree, X, y, method='.632+') acc = np.mean(scores) print('Accuracy: %.2f%%' % (100*acc)) # Confidence interval lower = np.percentile(scores, 2.5) upper = np.percentile(scores, 97.5) print('95%% Confidence interval: [%.2f, %.2f]' % (100*lower, 100*upper)) Accuracy: 96.40% 95% Confidence interval: [92.34, 99.00]","title":"Example 3 -- Evaluating the predictive performance of a model via the .632+ Bootstrap"},{"location":"user_guide/evaluate/bootstrap_point632_score/#api","text":"bootstrap_point632_score(estimator, X, y, n_splits=200, method='.632', scoring_func=None, random_seed=None, clone_estimator=True) Implementation of the .632 [1] and .632+ [2] bootstrap for supervised learning References: [1] Efron, Bradley. 1983. \u201cEstimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation.\u201d Journal of the American Statistical Association 78 (382): 316. doi:10.2307/2288636. [2] Efron, Bradley, and Robert Tibshirani. 1997. \u201cImprovements on Cross-Validation: The .632+ Bootstrap Method.\u201d Journal of the American Statistical Association 92 (438): 548. doi:10.2307/2965703. Parameters estimator : object An estimator for classification or regression that follows the scikit-learn API and implements \"fit\" and \"predict\" methods. X : array-like The data to fit. Can be, for example a list, or an array at least 2d. y : array-like, optional, default: None The target variable to try to predict in the case of supervised learning. n_splits : int (default=200) Number of bootstrap iterations. Must be larger than 1. method : str (default='.632') The bootstrap method, which can be either - 1) '.632' bootstrap (default) - 2) '.632+' bootstrap - 3) 'oob' (regular out-of-bag, no weighting) for comparison studies. scoring_func : callable, Score function (or loss function) with signature scoring_func(y, y_pred, **kwargs) . If none, uses classification accuracy if the estimator is a classifier and mean squared error if the estimator is a regressor. random_seed : int (default=None) If int, random_seed is the seed used by the random number generator. clone_estimator : bool (default=True) Clones the estimator if true, otherwise fits the original. Returns scores : array of float, shape=(len(list(n_splits)),) Array of scores of the estimator for each bootstrap replicate. Examples >>> from sklearn import datasets, linear_model >>> from mlxtend.evaluate import bootstrap_point632_score >>> iris = datasets.load_iris() >>> X = iris.data >>> y = iris.target >>> lr = linear_model.LogisticRegression() >>> scores = bootstrap_point632_score(lr, X, y) >>> acc = np.mean(scores) >>> print('Accuracy:', acc) 0.953023146884 >>> lower = np.percentile(scores, 2.5) >>> upper = np.percentile(scores, 97.5) >>> print('95%% Confidence interval: [%.2f, %.2f]' % (lower, upper)) 95% Confidence interval: [0.90, 0.98] For more usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/bootstrap_point632_score/","title":"API"},{"location":"user_guide/evaluate/cochrans_q/","text":"Cochran's Q Test Cochran's Q test for comparing the performance of multiple classifiers. from mlxtend.evaluate import cochrans_q Overview Cochran's Q test can be regarded as a generalized version of McNemar's test that can be applied to evaluate multiple classifiers. In a sense, Cochran's Q test is analogous to ANOVA for binary outcomes. To compare more than two classifiers, we can use Cochran's Q test, which has a test statistic Q that is approximately, (similar to McNemar's test), distributed as chi-squared with L-1 degrees of freedom, where L is the number of models we evaluate (since L=2 for McNemar's test, McNemars test statistic approximates a chi-squared distribution with one degree of freedom). More formally, Cochran's Q test tests the hypothesis that there is no difference between the classification accuracies [1]: p_i: H_0 = p_1 = p_2 = \\cdots = p_L. Let \\{D_1, \\dots , D_L\\} be a set of classifiers who have all been tested on the same dataset. If the L classifiers don't perform differently, then the following Q statistic is distributed approximately as \"chi-squared\" with L-1 degrees of freedom: Q_C = (L-1) \\frac{L \\sum^{L}_{i=1}G_{i}^{2} - T^2}{LT - \\sum^{N_{ts}}_{j=1} (L_j)^2}. Here, G_i is the number of objects out of N_{ts} correctly classified by D_i= 1, \\dots L ; L_j is the number of classifiers out of L that correctly classified object \\mathbf{z}_j \\in \\mathbf{Z}_{ts} , where \\mathbf{Z}_{ts} = \\{\\mathbf{z}_1, ... \\mathbf{z}_{N_{ts}}\\} is the test dataset on which the classifers are tested on; and T is the total number of correct number of votes among the L classifiers [2]: T = \\sum_{i=1}^{L} G_i = \\sum^{N_{ts}}_{j=1} L_j. To perform Cochran's Q test, we typically organize the classificier predictions in a binary N_{ts} \\times L matrix. The ij\\text{th} entry of such matrix is 0 if a classifier D_j has misclassified a data example (vector) \\mathbf{z}_i and 1 otherwise (if the classifier predicted the class label l(\\mathbf{z}_i) correctly) [2]. The following example taken from [2] illustrates how the classification results may be organized. For instance, assume we have the ground truth labels of the test dataset y_true and the following predictions by 3 classifiers ( y_model_1 , y_model_2 , and y_model_3 ): y_true = np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) y_model_1 = np.array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) y_model_2 = np.array([1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) y_model_3 = np.array([1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1]) The table of correct (1) and incorrect (0) classifications may then look as follows: D_1 (model 1) D_2 (model 2) D_3 (model 3) Occurrences 1 1 1 80 1 1 0 2 1 0 1 0 1 0 0 2 0 1 1 9 0 1 0 1 0 0 1 3 0 0 0 3 Accuracy 84/100*100% = 84% 92/100*100% = 92% 92/100*100% = 92% By plugging in the respective value into the previous equation, we obtain the following Q value [2]: Q_c = 2 \\times \\frac{3 \\times (84^2 + 92^2 + 92^2) - 268^2}{3\\times 268-(80 \\times 9 + 11 \\times 4 + 6 \\times 1)} \\approx 7.5294. (Note that the Q value in [2] is listed as 3.7647 due to a typo as discussed with the author, the value 7.5294 is the correct one.) Now, the Q value (approximating \\chi^2 ) corresponds to a p-value of approx. 0.023 assuming a \\chi^2 distribution with L-1 = 2 degrees of freedom. Assuming that we chose a significance level of \\alpha=0.05 , we would reject the null hypothesis that all classifiers perform equally well, since 0.023 < \\alpha . In practice, if we successfully rejected the null hypothesis, we could perform multiple post hoc pair-wise tests -- for example, McNemar tests with a Bonferroni correction -- to determine which pairs have different population proportions. References [1] Fleiss, Joseph L., Bruce Levin, and Myunghee Cho Paik. Statistical methods for rates and proportions. John Wiley & Sons, 2013. [2] Kuncheva, Ludmila I. Combining pattern classifiers: methods and algorithms. John Wiley & Sons, 2004. Example 1 - Cochran's Q test import numpy as np from mlxtend.evaluate import cochrans_q from mlxtend.evaluate import mcnemar_table from mlxtend.evaluate import mcnemar ## Dataset: # ground truth labels of the test dataset: y_true = np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) # predictions by 3 classifiers (`y_model_1`, `y_model_2`, and `y_model_3`): y_model_1 = np.array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) y_model_2 = np.array([1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) y_model_3 = np.array([1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1]) Assuming a significance level \\alpha=0.05 , we can conduct Cochran's Q test as follows, to test the null hypothesis there is no difference between the classification accuracies, p_i: H_0 = p_1 = p_2 = \\cdots = p_L : q, p_value = cochrans_q(y_true, y_model_1, y_model_2, y_model_3) print('Q: %.3f' % q) print('p-value: %.3f' % p_value) Q: 7.529 p-value: 0.023 Since the p-value is smaller than \\alpha , we can reject the null hypothesis and conclude that there is a difference between the classification accuracies. As mentioned in the introduction earlier, we could now perform multiple post hoc pair-wise tests -- for example, McNemar tests with a Bonferroni correction -- to determine which pairs have different population proportions. Lastly, let's illustrate that Cochran's Q test is indeed just a generalized version of McNemar's test: chi2, p_value = cochrans_q(y_true, y_model_1, y_model_2) print('Cochran\\'s Q Chi^2: %.3f' % chi2) print('Cochran\\'s Q p-value: %.3f' % p_value) Cochran's Q Chi^2: 5.333 Cochran's Q p-value: 0.021 chi2, p_value = mcnemar(mcnemar_table(y_true, y_model_1, y_model_2), corrected=False) print('McNemar\\'s Chi^2: %.3f' % chi2) print('McNemar\\'s p-value: %.3f' % p_value) McNemar's Chi^2: 5.333 McNemar's p-value: 0.021 API cochrans_q(y_target, y_model_predictions)* Cochran's Q test to compare 2 or more models. Parameters y_target : array-like, shape=[n_samples] True class labels as 1D NumPy array. *y_model_predictions : array-likes, shape=[n_samples] Variable number of 2 or more arrays that contain the predicted class labels from models as 1D NumPy array. Returns q, p : float or None, float Returns the Q (chi-squared) value and the p-value Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/cochrans_q/","title":"Cochran's Q Test"},{"location":"user_guide/evaluate/cochrans_q/#cochrans-q-test","text":"Cochran's Q test for comparing the performance of multiple classifiers. from mlxtend.evaluate import cochrans_q","title":"Cochran's Q Test"},{"location":"user_guide/evaluate/cochrans_q/#overview","text":"Cochran's Q test can be regarded as a generalized version of McNemar's test that can be applied to evaluate multiple classifiers. In a sense, Cochran's Q test is analogous to ANOVA for binary outcomes. To compare more than two classifiers, we can use Cochran's Q test, which has a test statistic Q that is approximately, (similar to McNemar's test), distributed as chi-squared with L-1 degrees of freedom, where L is the number of models we evaluate (since L=2 for McNemar's test, McNemars test statistic approximates a chi-squared distribution with one degree of freedom). More formally, Cochran's Q test tests the hypothesis that there is no difference between the classification accuracies [1]: p_i: H_0 = p_1 = p_2 = \\cdots = p_L. Let \\{D_1, \\dots , D_L\\} be a set of classifiers who have all been tested on the same dataset. If the L classifiers don't perform differently, then the following Q statistic is distributed approximately as \"chi-squared\" with L-1 degrees of freedom: Q_C = (L-1) \\frac{L \\sum^{L}_{i=1}G_{i}^{2} - T^2}{LT - \\sum^{N_{ts}}_{j=1} (L_j)^2}. Here, G_i is the number of objects out of N_{ts} correctly classified by D_i= 1, \\dots L ; L_j is the number of classifiers out of L that correctly classified object \\mathbf{z}_j \\in \\mathbf{Z}_{ts} , where \\mathbf{Z}_{ts} = \\{\\mathbf{z}_1, ... \\mathbf{z}_{N_{ts}}\\} is the test dataset on which the classifers are tested on; and T is the total number of correct number of votes among the L classifiers [2]: T = \\sum_{i=1}^{L} G_i = \\sum^{N_{ts}}_{j=1} L_j. To perform Cochran's Q test, we typically organize the classificier predictions in a binary N_{ts} \\times L matrix. The ij\\text{th} entry of such matrix is 0 if a classifier D_j has misclassified a data example (vector) \\mathbf{z}_i and 1 otherwise (if the classifier predicted the class label l(\\mathbf{z}_i) correctly) [2]. The following example taken from [2] illustrates how the classification results may be organized. For instance, assume we have the ground truth labels of the test dataset y_true and the following predictions by 3 classifiers ( y_model_1 , y_model_2 , and y_model_3 ): y_true = np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) y_model_1 = np.array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) y_model_2 = np.array([1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) y_model_3 = np.array([1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1]) The table of correct (1) and incorrect (0) classifications may then look as follows: D_1 (model 1) D_2 (model 2) D_3 (model 3) Occurrences 1 1 1 80 1 1 0 2 1 0 1 0 1 0 0 2 0 1 1 9 0 1 0 1 0 0 1 3 0 0 0 3 Accuracy 84/100*100% = 84% 92/100*100% = 92% 92/100*100% = 92% By plugging in the respective value into the previous equation, we obtain the following Q value [2]: Q_c = 2 \\times \\frac{3 \\times (84^2 + 92^2 + 92^2) - 268^2}{3\\times 268-(80 \\times 9 + 11 \\times 4 + 6 \\times 1)} \\approx 7.5294. (Note that the Q value in [2] is listed as 3.7647 due to a typo as discussed with the author, the value 7.5294 is the correct one.) Now, the Q value (approximating \\chi^2 ) corresponds to a p-value of approx. 0.023 assuming a \\chi^2 distribution with L-1 = 2 degrees of freedom. Assuming that we chose a significance level of \\alpha=0.05 , we would reject the null hypothesis that all classifiers perform equally well, since 0.023 < \\alpha . In practice, if we successfully rejected the null hypothesis, we could perform multiple post hoc pair-wise tests -- for example, McNemar tests with a Bonferroni correction -- to determine which pairs have different population proportions.","title":"Overview"},{"location":"user_guide/evaluate/cochrans_q/#references","text":"[1] Fleiss, Joseph L., Bruce Levin, and Myunghee Cho Paik. Statistical methods for rates and proportions. John Wiley & Sons, 2013. [2] Kuncheva, Ludmila I. Combining pattern classifiers: methods and algorithms. John Wiley & Sons, 2004.","title":"References"},{"location":"user_guide/evaluate/cochrans_q/#example-1-cochrans-q-test","text":"import numpy as np from mlxtend.evaluate import cochrans_q from mlxtend.evaluate import mcnemar_table from mlxtend.evaluate import mcnemar ## Dataset: # ground truth labels of the test dataset: y_true = np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) # predictions by 3 classifiers (`y_model_1`, `y_model_2`, and `y_model_3`): y_model_1 = np.array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) y_model_2 = np.array([1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) y_model_3 = np.array([1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1]) Assuming a significance level \\alpha=0.05 , we can conduct Cochran's Q test as follows, to test the null hypothesis there is no difference between the classification accuracies, p_i: H_0 = p_1 = p_2 = \\cdots = p_L : q, p_value = cochrans_q(y_true, y_model_1, y_model_2, y_model_3) print('Q: %.3f' % q) print('p-value: %.3f' % p_value) Q: 7.529 p-value: 0.023 Since the p-value is smaller than \\alpha , we can reject the null hypothesis and conclude that there is a difference between the classification accuracies. As mentioned in the introduction earlier, we could now perform multiple post hoc pair-wise tests -- for example, McNemar tests with a Bonferroni correction -- to determine which pairs have different population proportions. Lastly, let's illustrate that Cochran's Q test is indeed just a generalized version of McNemar's test: chi2, p_value = cochrans_q(y_true, y_model_1, y_model_2) print('Cochran\\'s Q Chi^2: %.3f' % chi2) print('Cochran\\'s Q p-value: %.3f' % p_value) Cochran's Q Chi^2: 5.333 Cochran's Q p-value: 0.021 chi2, p_value = mcnemar(mcnemar_table(y_true, y_model_1, y_model_2), corrected=False) print('McNemar\\'s Chi^2: %.3f' % chi2) print('McNemar\\'s p-value: %.3f' % p_value) McNemar's Chi^2: 5.333 McNemar's p-value: 0.021","title":"Example 1 - Cochran's Q test"},{"location":"user_guide/evaluate/cochrans_q/#api","text":"cochrans_q(y_target, y_model_predictions)* Cochran's Q test to compare 2 or more models. Parameters y_target : array-like, shape=[n_samples] True class labels as 1D NumPy array. *y_model_predictions : array-likes, shape=[n_samples] Variable number of 2 or more arrays that contain the predicted class labels from models as 1D NumPy array. Returns q, p : float or None, float Returns the Q (chi-squared) value and the p-value Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/cochrans_q/","title":"API"},{"location":"user_guide/evaluate/combined_ftest_5x2cv/","text":"5x2cv combined F test 5x2cv combined F test procedure to compare the performance of two models from mlxtend.evaluate import combined_ftest_5x2cv Overview The 5x2cv combined F test is a procedure for comparing the performance of two models (classifiers or regressors) that was proposed by Alpaydin [1] as a more robust alternative to Dietterich's 5x2cv paired t-test procedure [2]. paired_ttest_5x2cv.md . Dietterich's 5x2cv method was in turn was designed to address shortcomings in other methods such as the resampled paired t test (see paired_ttest_resampled ) and the k-fold cross-validated paired t test (see paired_ttest_kfold_cv ). To explain how this method works, let's consider to estimator (e.g., classifiers) A and B. Further, we have a labeled dataset D . In the common hold-out method, we typically split the dataset into 2 parts: a training and a test set. In the 5x2cv paired t test, we repeat the splitting (50% training and 50% test data) 5 times. In each of the 5 iterations, we fit A and B to the training split and evaluate their performance ( p_A and p_B ) on the test split. Then, we rotate the training and test sets (the training set becomes the test set and vice versa) compute the performance again, which results in 2 performance difference measures: p^{(1)} = p^{(1)}_A - p^{(1)}_B and p^{(2)} = p^{(2)}_A - p^{(2)}_B. Then, we estimate the estimate mean and variance of the differences: \\overline{p} = \\frac{p^{(1)} + p^{(2)}}{2} and s^2 = (p^{(1)} - \\overline{p})^2 + (p^{(2)} - \\overline{p})^2. The F-statistic proposed by Alpaydin (see paper for justifications) is then computed as \\mathcal{f} = \\frac{\\sum_{i=1}^{5} \\sum_{j=1}^2 (p_i^{j})^2}{2 \\sum_{i=1}^5 s_i^2}, which is approximately F distributed with 10 and 5 degress of freedom. Using the f statistic, the p value can be computed and compared with a previously chosen significance level, e.g., \\alpha=0.05 . If the p value is smaller than \\alpha , we reject the null hypothesis and accept that there is a significant difference in the two models. References [1] Alpaydin, E. (1999). Combined 5\u00d72 cv F test for comparing supervised classification learning algorithms. Neural computation, 11(8), 1885-1892. [2] Dietterich TG (1998) Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms. Neural Comput 10:1895\u20131923. Example 1 - 5x2cv combined F test Assume we want to compare two classification algorithms, logistic regression and a decision tree algorithm: from sklearn.linear_model import LogisticRegression from sklearn.tree import DecisionTreeClassifier from mlxtend.data import iris_data from sklearn.model_selection import train_test_split X, y = iris_data() clf1 = LogisticRegression(random_state=1, solver='liblinear', multi_class='ovr') clf2 = DecisionTreeClassifier(random_state=1) X_train, X_test, y_train, y_test = \\ train_test_split(X, y, test_size=0.25, random_state=123) score1 = clf1.fit(X_train, y_train).score(X_test, y_test) score2 = clf2.fit(X_train, y_train).score(X_test, y_test) print('Logistic regression accuracy: %.2f%%' % (score1*100)) print('Decision tree accuracy: %.2f%%' % (score2*100)) Logistic regression accuracy: 97.37% Decision tree accuracy: 94.74% Note that these accuracy values are not used in the paired f test procedure as new test/train splits are generated during the resampling procedure, the values above are just serving the purpose of intuition. Now, let's assume a significance threshold of \\alpha=0.05 for rejecting the null hypothesis that both algorithms perform equally well on the dataset and conduct the 5x2cv f test: from mlxtend.evaluate import combined_ftest_5x2cv f, p = combined_ftest_5x2cv(estimator1=clf1, estimator2=clf2, X=X, y=y, random_seed=1) print('F statistic: %.3f' % f) print('p value: %.3f' % p) F statistic: 1.053 p value: 0.509 Since p > \\alpha , we cannot reject the null hypothesis and may conclude that the performance of the two algorithms is not significantly different. While it is generally not recommended to apply statistical tests multiple times without correction for multiple hypothesis testing, let us take a look at an example where the decision tree algorithm is limited to producing a very simple decision boundary that would result in a relatively bad performance: clf2 = DecisionTreeClassifier(random_state=1, max_depth=1) score2 = clf2.fit(X_train, y_train).score(X_test, y_test) print('Decision tree accuracy: %.2f%%' % (score2*100)) f, p = combined_ftest_5x2cv(estimator1=clf1, estimator2=clf2, X=X, y=y, random_seed=1) print('F statistic: %.3f' % f) print('p value: %.3f' % p) Decision tree accuracy: 63.16% F statistic: 34.934 p value: 0.001 Assuming that we conducted this test also with a significance level of \\alpha=0.05 , we can reject the null-hypothesis that both models perform equally well on this dataset, since the p-value ( p < 0.001 ) is smaller than \\alpha . API combined_ftest_5x2cv(estimator1, estimator2, X, y, scoring=None, random_seed=None) Implements the 5x2cv combined F test proposed by Alpaydin 1999, to compare the performance of two models. Parameters estimator1 : scikit-learn classifier or regressor estimator2 : scikit-learn classifier or regressor X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. scoring : str, callable, or None (default: None) If None (default), uses 'accuracy' for sklearn classifiers and 'r2' for sklearn regressors. If str, uses a sklearn scoring metric string identifier, for example {accuracy, f1, precision, recall, roc_auc} for classifiers, {'mean_absolute_error', 'mean_squared_error'/'neg_mean_squared_error', 'median_absolute_error', 'r2'} for regressors. If a callable object or function is provided, it has to be conform with sklearn's signature scorer(estimator, X, y) ; see http://scikit-learn.org/stable/modules/generated/sklearn.metrics.make_scorer.html for more information. random_seed : int or None (default: None) Random seed for creating the test/train splits. Returns f : float The F-statistic pvalue : float Two-tailed p-value. If the chosen significance level is larger than the p-value, we reject the null hypothesis and accept that there are significant differences in the two compared models. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/combined_ftest_5x2cv/","title":"5x2cv combined *F* test"},{"location":"user_guide/evaluate/combined_ftest_5x2cv/#5x2cv-combined-f-test","text":"5x2cv combined F test procedure to compare the performance of two models from mlxtend.evaluate import combined_ftest_5x2cv","title":"5x2cv combined F test"},{"location":"user_guide/evaluate/combined_ftest_5x2cv/#overview","text":"The 5x2cv combined F test is a procedure for comparing the performance of two models (classifiers or regressors) that was proposed by Alpaydin [1] as a more robust alternative to Dietterich's 5x2cv paired t-test procedure [2]. paired_ttest_5x2cv.md . Dietterich's 5x2cv method was in turn was designed to address shortcomings in other methods such as the resampled paired t test (see paired_ttest_resampled ) and the k-fold cross-validated paired t test (see paired_ttest_kfold_cv ). To explain how this method works, let's consider to estimator (e.g., classifiers) A and B. Further, we have a labeled dataset D . In the common hold-out method, we typically split the dataset into 2 parts: a training and a test set. In the 5x2cv paired t test, we repeat the splitting (50% training and 50% test data) 5 times. In each of the 5 iterations, we fit A and B to the training split and evaluate their performance ( p_A and p_B ) on the test split. Then, we rotate the training and test sets (the training set becomes the test set and vice versa) compute the performance again, which results in 2 performance difference measures: p^{(1)} = p^{(1)}_A - p^{(1)}_B and p^{(2)} = p^{(2)}_A - p^{(2)}_B. Then, we estimate the estimate mean and variance of the differences: \\overline{p} = \\frac{p^{(1)} + p^{(2)}}{2} and s^2 = (p^{(1)} - \\overline{p})^2 + (p^{(2)} - \\overline{p})^2. The F-statistic proposed by Alpaydin (see paper for justifications) is then computed as \\mathcal{f} = \\frac{\\sum_{i=1}^{5} \\sum_{j=1}^2 (p_i^{j})^2}{2 \\sum_{i=1}^5 s_i^2}, which is approximately F distributed with 10 and 5 degress of freedom. Using the f statistic, the p value can be computed and compared with a previously chosen significance level, e.g., \\alpha=0.05 . If the p value is smaller than \\alpha , we reject the null hypothesis and accept that there is a significant difference in the two models.","title":"Overview"},{"location":"user_guide/evaluate/combined_ftest_5x2cv/#references","text":"[1] Alpaydin, E. (1999). Combined 5\u00d72 cv F test for comparing supervised classification learning algorithms. Neural computation, 11(8), 1885-1892. [2] Dietterich TG (1998) Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms. Neural Comput 10:1895\u20131923.","title":"References"},{"location":"user_guide/evaluate/combined_ftest_5x2cv/#example-1-5x2cv-combined-f-test","text":"Assume we want to compare two classification algorithms, logistic regression and a decision tree algorithm: from sklearn.linear_model import LogisticRegression from sklearn.tree import DecisionTreeClassifier from mlxtend.data import iris_data from sklearn.model_selection import train_test_split X, y = iris_data() clf1 = LogisticRegression(random_state=1, solver='liblinear', multi_class='ovr') clf2 = DecisionTreeClassifier(random_state=1) X_train, X_test, y_train, y_test = \\ train_test_split(X, y, test_size=0.25, random_state=123) score1 = clf1.fit(X_train, y_train).score(X_test, y_test) score2 = clf2.fit(X_train, y_train).score(X_test, y_test) print('Logistic regression accuracy: %.2f%%' % (score1*100)) print('Decision tree accuracy: %.2f%%' % (score2*100)) Logistic regression accuracy: 97.37% Decision tree accuracy: 94.74% Note that these accuracy values are not used in the paired f test procedure as new test/train splits are generated during the resampling procedure, the values above are just serving the purpose of intuition. Now, let's assume a significance threshold of \\alpha=0.05 for rejecting the null hypothesis that both algorithms perform equally well on the dataset and conduct the 5x2cv f test: from mlxtend.evaluate import combined_ftest_5x2cv f, p = combined_ftest_5x2cv(estimator1=clf1, estimator2=clf2, X=X, y=y, random_seed=1) print('F statistic: %.3f' % f) print('p value: %.3f' % p) F statistic: 1.053 p value: 0.509 Since p > \\alpha , we cannot reject the null hypothesis and may conclude that the performance of the two algorithms is not significantly different. While it is generally not recommended to apply statistical tests multiple times without correction for multiple hypothesis testing, let us take a look at an example where the decision tree algorithm is limited to producing a very simple decision boundary that would result in a relatively bad performance: clf2 = DecisionTreeClassifier(random_state=1, max_depth=1) score2 = clf2.fit(X_train, y_train).score(X_test, y_test) print('Decision tree accuracy: %.2f%%' % (score2*100)) f, p = combined_ftest_5x2cv(estimator1=clf1, estimator2=clf2, X=X, y=y, random_seed=1) print('F statistic: %.3f' % f) print('p value: %.3f' % p) Decision tree accuracy: 63.16% F statistic: 34.934 p value: 0.001 Assuming that we conducted this test also with a significance level of \\alpha=0.05 , we can reject the null-hypothesis that both models perform equally well on this dataset, since the p-value ( p < 0.001 ) is smaller than \\alpha .","title":"Example 1 - 5x2cv combined F test"},{"location":"user_guide/evaluate/combined_ftest_5x2cv/#api","text":"combined_ftest_5x2cv(estimator1, estimator2, X, y, scoring=None, random_seed=None) Implements the 5x2cv combined F test proposed by Alpaydin 1999, to compare the performance of two models. Parameters estimator1 : scikit-learn classifier or regressor estimator2 : scikit-learn classifier or regressor X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. scoring : str, callable, or None (default: None) If None (default), uses 'accuracy' for sklearn classifiers and 'r2' for sklearn regressors. If str, uses a sklearn scoring metric string identifier, for example {accuracy, f1, precision, recall, roc_auc} for classifiers, {'mean_absolute_error', 'mean_squared_error'/'neg_mean_squared_error', 'median_absolute_error', 'r2'} for regressors. If a callable object or function is provided, it has to be conform with sklearn's signature scorer(estimator, X, y) ; see http://scikit-learn.org/stable/modules/generated/sklearn.metrics.make_scorer.html for more information. random_seed : int or None (default: None) Random seed for creating the test/train splits. Returns f : float The F-statistic pvalue : float Two-tailed p-value. If the chosen significance level is larger than the p-value, we reject the null hypothesis and accept that there are significant differences in the two compared models. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/combined_ftest_5x2cv/","title":"API"},{"location":"user_guide/evaluate/confusion_matrix/","text":"Confusion Matrix Functions for generating confusion matrices. from mlxtend.evaluate import confusion_matrix from mlxtend.plotting import plot_confusion_matrix Overview Confusion Matrix The confusion matrix (or error matrix ) is one way to summarize the performance of a classifier for binary classification tasks. This square matrix consists of columns and rows that list the number of instances as absolute or relative \"actual class\" vs. \"predicted class\" ratios. Let P be the label of class 1 and N be the label of a second class or the label of all classes that are not class 1 in a multi-class setting. References - Example 1 - Binary classification from mlxtend.evaluate import confusion_matrix y_target = [0, 0, 1, 0, 0, 1, 1, 1] y_predicted = [1, 0, 1, 0, 0, 0, 0, 1] cm = confusion_matrix(y_target=y_target, y_predicted=y_predicted) cm array([[3, 1], [2, 2]]) To visualize the confusion matrix using matplotlib, see the utility function mlxtend.plotting.plot_confusion_matrix : import matplotlib.pyplot as plt from mlxtend.plotting import plot_confusion_matrix fig, ax = plot_confusion_matrix(conf_mat=cm) plt.show() Example 2 - Multi-class classification from mlxtend.evaluate import confusion_matrix y_target = [1, 1, 1, 0, 0, 2, 0, 3] y_predicted = [1, 0, 1, 0, 0, 2, 1, 3] cm = confusion_matrix(y_target=y_target, y_predicted=y_predicted, binary=False) cm array([[2, 1, 0, 0], [1, 2, 0, 0], [0, 0, 1, 0], [0, 0, 0, 1]]) To visualize the confusion matrix using matplotlib, see the utility function mlxtend.plotting.plot_confusion_matrix : import matplotlib.pyplot as plt from mlxtend.evaluate import confusion_matrix fig, ax = plot_confusion_matrix(conf_mat=cm) plt.show() Example 3 - Multi-class to binary By setting binary=True , all class labels that are not the positive class label are being summarized to class 0. The positive class label becomes class 1. import matplotlib.pyplot as plt from mlxtend.evaluate import confusion_matrix y_target = [1, 1, 1, 0, 0, 2, 0, 3] y_predicted = [1, 0, 1, 0, 0, 2, 1, 3] cm = confusion_matrix(y_target=y_target, y_predicted=y_predicted, binary=True, positive_label=1) cm array([[4, 1], [1, 2]]) To visualize the confusion matrix using matplotlib, see the utility function mlxtend.plotting.plot_confusion_matrix : from mlxtend.plotting import plot_confusion_matrix fig, ax = plot_confusion_matrix(conf_mat=cm) plt.show() API confusion_matrix(y_target, y_predicted, binary=False, positive_label=1) Compute a confusion matrix/contingency table. Parameters y_target : array-like, shape=[n_samples] True class labels. y_predicted : array-like, shape=[n_samples] Predicted class labels. binary : bool (default: False) Maps a multi-class problem onto a binary confusion matrix, where the positive class is 1 and all other classes are 0. positive_label : int (default: 1) Class label of the positive class. Returns mat : array-like, shape=[n_classes, n_classes] Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/confusion_matrix/","title":"Confusion Matrix"},{"location":"user_guide/evaluate/confusion_matrix/#confusion-matrix","text":"Functions for generating confusion matrices. from mlxtend.evaluate import confusion_matrix from mlxtend.plotting import plot_confusion_matrix","title":"Confusion Matrix"},{"location":"user_guide/evaluate/confusion_matrix/#overview","text":"","title":"Overview"},{"location":"user_guide/evaluate/confusion_matrix/#confusion-matrix_1","text":"The confusion matrix (or error matrix ) is one way to summarize the performance of a classifier for binary classification tasks. This square matrix consists of columns and rows that list the number of instances as absolute or relative \"actual class\" vs. \"predicted class\" ratios. Let P be the label of class 1 and N be the label of a second class or the label of all classes that are not class 1 in a multi-class setting.","title":"Confusion Matrix"},{"location":"user_guide/evaluate/confusion_matrix/#references","text":"-","title":"References"},{"location":"user_guide/evaluate/confusion_matrix/#example-1-binary-classification","text":"from mlxtend.evaluate import confusion_matrix y_target = [0, 0, 1, 0, 0, 1, 1, 1] y_predicted = [1, 0, 1, 0, 0, 0, 0, 1] cm = confusion_matrix(y_target=y_target, y_predicted=y_predicted) cm array([[3, 1], [2, 2]]) To visualize the confusion matrix using matplotlib, see the utility function mlxtend.plotting.plot_confusion_matrix : import matplotlib.pyplot as plt from mlxtend.plotting import plot_confusion_matrix fig, ax = plot_confusion_matrix(conf_mat=cm) plt.show()","title":"Example 1 - Binary classification"},{"location":"user_guide/evaluate/confusion_matrix/#example-2-multi-class-classification","text":"from mlxtend.evaluate import confusion_matrix y_target = [1, 1, 1, 0, 0, 2, 0, 3] y_predicted = [1, 0, 1, 0, 0, 2, 1, 3] cm = confusion_matrix(y_target=y_target, y_predicted=y_predicted, binary=False) cm array([[2, 1, 0, 0], [1, 2, 0, 0], [0, 0, 1, 0], [0, 0, 0, 1]]) To visualize the confusion matrix using matplotlib, see the utility function mlxtend.plotting.plot_confusion_matrix : import matplotlib.pyplot as plt from mlxtend.evaluate import confusion_matrix fig, ax = plot_confusion_matrix(conf_mat=cm) plt.show()","title":"Example 2 - Multi-class classification"},{"location":"user_guide/evaluate/confusion_matrix/#example-3-multi-class-to-binary","text":"By setting binary=True , all class labels that are not the positive class label are being summarized to class 0. The positive class label becomes class 1. import matplotlib.pyplot as plt from mlxtend.evaluate import confusion_matrix y_target = [1, 1, 1, 0, 0, 2, 0, 3] y_predicted = [1, 0, 1, 0, 0, 2, 1, 3] cm = confusion_matrix(y_target=y_target, y_predicted=y_predicted, binary=True, positive_label=1) cm array([[4, 1], [1, 2]]) To visualize the confusion matrix using matplotlib, see the utility function mlxtend.plotting.plot_confusion_matrix : from mlxtend.plotting import plot_confusion_matrix fig, ax = plot_confusion_matrix(conf_mat=cm) plt.show()","title":"Example 3 - Multi-class to binary"},{"location":"user_guide/evaluate/confusion_matrix/#api","text":"confusion_matrix(y_target, y_predicted, binary=False, positive_label=1) Compute a confusion matrix/contingency table. Parameters y_target : array-like, shape=[n_samples] True class labels. y_predicted : array-like, shape=[n_samples] Predicted class labels. binary : bool (default: False) Maps a multi-class problem onto a binary confusion matrix, where the positive class is 1 and all other classes are 0. positive_label : int (default: 1) Class label of the positive class. Returns mat : array-like, shape=[n_classes, n_classes] Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/confusion_matrix/","title":"API"},{"location":"user_guide/evaluate/feature_importance_permutation/","text":"Feature Importance Permutation A function to estimate the feature importance of classifiers and regressors based on permutation importance . from mlxtend.evaluate import feature_importance_permutation Overview The permutation importance is an intuitive, model-agnostic method to estimate the feature importance for classifier and regression models. The approach is relatively simple and straight-forward: Take a model that was fit to the training dataset Estimate the predictive performance of the model on an independent dataset (e.g., validation dataset) and record it as the baseline performance For each feature i : randomly permute feature column i in the original dataset record the predictive performance of the model on the dataset with the permuted column compute the feature importance as the difference between the baseline performance (step 2) and the performance on the permuted dataset Permutation importance is generally considered as a relatively efficient technique that works well in practice [1], while a drawback is that the importance of correlated features may be overestimated [2]. References [1] Terence Parr, Kerem Turgutlu, Christopher Csiszar, and Jeremy Howard. Beware Default Random Forest Importances (http://parrt.cs.usfca.edu/doc/rf-importance/index.html) [2] Strobl, C., Boulesteix, A. L., Kneib, T., Augustin, T., & Zeileis, A. (2008). Conditional variable importance for random forests. BMC bioinformatics, 9(1), 307. Example 1 -- Feature Importance for Classifiers The following example illustrates the feature importance estimation via permutation importance based for classification models. import numpy as np import matplotlib.pyplot as plt from sklearn.svm import SVC from sklearn.model_selection import train_test_split from mlxtend.evaluate import feature_importance_permutation Generate a toy dataset from sklearn.datasets import make_classification from sklearn.ensemble import RandomForestClassifier # Build a classification task using 3 informative features X, y = make_classification(n_samples=10000, n_features=10, n_informative=3, n_redundant=0, n_repeated=0, n_classes=2, random_state=0, shuffle=False) X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.3, random_state=1, stratify=y) Feature importance via random forest First, we compute the feature importance directly from the random forest via mean impurity decrease (described after the code section): forest = RandomForestClassifier(n_estimators=250, random_state=0) forest.fit(X_train, y_train) print('Training accuracy:', np.mean(forest.predict(X_train) == y_train)*100) print('Test accuracy:', np.mean(forest.predict(X_test) == y_test)*100) importance_vals = forest.feature_importances_ print(importance_vals) Training accuracy: 100.0 Test accuracy: 95.0666666667 [ 0.283357 0.30846795 0.24204291 0.02229767 0.02364941 0.02390578 0.02501543 0.0234225 0.02370816 0.0241332 ] There are several strategies for computing the feature importance in random forest. The method implemented in scikit-learn (used in the next code example) is based on the Breiman and Friedman's CART (Breiman, Friedman, \"Classification and regression trees\", 1984), the so-called mean impurity decrease . Here, the importance value of a features is computed by averaging the impurity decrease for that feature, when splitting a parent node into two child nodes, across all the trees in the ensemble. Note that the impurity decrease values are weighted by the number of samples that are in the respective nodes. This process is repeated for all features in the dataset, and the feature importance values are then normalized so that they sum up to 1. In CART, the authors also note that this fast way of computing feature importance values is relatively consistent with the permutation importance. Next, let's visualize the feature importance values from the random forest including a measure of the mean impurity decrease variability (here: standard deviation): std = np.std([tree.feature_importances_ for tree in forest.estimators_], axis=0) indices = np.argsort(importance_vals)[::-1] # Plot the feature importances of the forest plt.figure() plt.title(\"Random Forest feature importance\") plt.bar(range(X.shape[1]), importance_vals[indices], yerr=std[indices], align=\"center\") plt.xticks(range(X.shape[1]), indices) plt.xlim([-1, X.shape[1]]) plt.ylim([0, 0.5]) plt.show() As we can see, the features 1, 0, and 2 are estimated to be the most informative ones for the random forest classier. Next, let's compute the feature importance via the permutation importance approach. Permutation Importance imp_vals, _ = feature_importance_permutation( predict_method=forest.predict, X=X_test, y=y_test, metric='accuracy', num_rounds=1, seed=1) imp_vals array([ 0.26833333, 0.26733333, 0.261 , -0.002 , -0.00033333, 0.00066667, 0.00233333, 0.00066667, 0.00066667, -0.00233333]) Note that the feature_importance_permutation returns two arrays. The first array (here: imp_vals ) contains the actual importance values we are interested in. If num_rounds > 1 , the permutation is repeated multiple times (with different random seeds), and in this case the first array contains the average value of the importance computed from the different runs. The second array (here, assigned to _ , because we are not using it) then contains all individual values from these runs (more about that later). Now, let's also visualize the importance values in a barplot: indices = np.argsort(imp_vals)[::-1] plt.figure() plt.title(\"Random Forest feature importance via permutation importance\") plt.bar(range(X.shape[1]), imp_vals[indices]) plt.xticks(range(X.shape[1]), indices) plt.xlim([-1, X.shape[1]]) plt.ylim([0, 0.5]) plt.show() As we can see, also here, features 1, 0, and 2 are predicted to be the most important ones, which is consistent with the feature importance values that we computed via the mean impurity decrease method earlier. (Note that in the context of random forests, the feature importance via permutation importance is typically computed using the out-of-bag samples of a random forest, whereas in this implementation, an independent dataset is used.) Previously, it was mentioned that the permutation is repeated multiple times if num_rounds > 1 . In this case, the second array returned by the feature_importance_permutation contains the importance values for these individual runs (the array has shape [num_features, num_rounds), which we can use to compute some sort of variability between these runs. imp_vals, imp_all = feature_importance_permutation( predict_method=forest.predict, X=X_test, y=y_test, metric='accuracy', num_rounds=10, seed=1) std = np.std(imp_all, axis=1) indices = np.argsort(imp_vals)[::-1] plt.figure() plt.title(\"Random Forest feature importance via permutation importance w. std. dev.\") plt.bar(range(X.shape[1]), imp_vals[indices], yerr=std[indices]) plt.xticks(range(X.shape[1]), indices) plt.xlim([-1, X.shape[1]]) plt.show() It shall be noted that the feature importance values do not sum up to one, since they are not normalized (you can normalize them if you'd like, by dividing these by the sum of importance values). Here, the main point is to look at the importance values relative to each other and not to over-interpret the absolute values. Support Vector Machines While the permutation importance approach yields results that are generally consistent with the mean impurity decrease feature importance values from a random forest, it's a method that is model-agnostic and can be used with any kind of classifier or regressor. The example below applies the feature_importance_permutation function to a support vector machine: from sklearn.svm import SVC svm = SVC(C=1.0, kernel='rbf') svm.fit(X_train, y_train) print('Training accuracy', np.mean(svm.predict(X_train) == y_train)*100) print('Test accuracy', np.mean(svm.predict(X_test) == y_test)*100) Training accuracy 95.0857142857 Test accuracy 94.9666666667 imp_vals, imp_all = feature_importance_permutation( predict_method=svm.predict, X=X_test, y=y_test, metric='accuracy', num_rounds=10, seed=1) std = np.std(imp_all, axis=1) indices = np.argsort(imp_vals)[::-1] plt.figure() plt.title(\"SVM feature importance via permutation importance\") plt.bar(range(X.shape[1]), imp_vals[indices], yerr=std[indices]) plt.xticks(range(X.shape[1]), indices) plt.xlim([-1, X.shape[1]]) plt.show() Example 1 -- Feature Importance for Regressors import numpy as np import matplotlib.pyplot as plt from mlxtend.evaluate import feature_importance_permutation from sklearn.model_selection import train_test_split from sklearn.datasets import make_regression from sklearn.svm import SVR X, y = make_regression(n_samples=1000, n_features=5, n_informative=2, n_targets=1, random_state=123, shuffle=False) X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.3, random_state=123) svm = SVR(kernel='rbf') svm.fit(X_train, y_train) imp_vals, _ = feature_importance_permutation( predict_method=svm.predict, X=X_test, y=y_test, metric='r2', num_rounds=1, seed=1) imp_vals array([ 0.43676245, 0.22231268, 0.00146906, 0.01611528, -0.00522067]) plt.figure() plt.bar(range(X.shape[1]), imp_vals) plt.xticks(range(X.shape[1])) plt.xlim([-1, X.shape[1]]) plt.ylim([0, 0.5]) plt.show() API feature_importance_permutation(X, y, predict_method, metric, num_rounds=1, seed=None) Feature importance imputation via permutation importance Parameters X : NumPy array, shape = [n_samples, n_features] Dataset, where n_samples is the number of samples and n_features is the number of features. y : NumPy array, shape = [n_samples] Target values. predict_method : prediction function A callable function that predicts the target values from X. metric : str, callable The metric for evaluating the feature importance through permutation. By default, the strings 'accuracy' is recommended for classifiers and the string 'r2' is recommended for regressors. Optionally, a custom scoring function (e.g., metric=scoring_func ) that accepts two arguments, y_true and y_pred, which have similar shape to the y array. num_rounds : int (default=1) Number of rounds the feature columns are permuted to compute the permutation importance. seed : int or None (default=None) Random seed for permuting the feature columns. Returns mean_importance_vals, all_importance_vals : NumPy arrays. The first array, mean_importance_vals has shape [n_features, ] and contains the importance values for all features. The shape of the second array is [n_features, num_rounds] and contains the feature importance for each repetition. If num_rounds=1, it contains the same values as the first array, mean_importance_vals. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/feature_importance_permutation/","title":"Feature Importance Permutation"},{"location":"user_guide/evaluate/feature_importance_permutation/#feature-importance-permutation","text":"A function to estimate the feature importance of classifiers and regressors based on permutation importance . from mlxtend.evaluate import feature_importance_permutation","title":"Feature Importance Permutation"},{"location":"user_guide/evaluate/feature_importance_permutation/#overview","text":"The permutation importance is an intuitive, model-agnostic method to estimate the feature importance for classifier and regression models. The approach is relatively simple and straight-forward: Take a model that was fit to the training dataset Estimate the predictive performance of the model on an independent dataset (e.g., validation dataset) and record it as the baseline performance For each feature i : randomly permute feature column i in the original dataset record the predictive performance of the model on the dataset with the permuted column compute the feature importance as the difference between the baseline performance (step 2) and the performance on the permuted dataset Permutation importance is generally considered as a relatively efficient technique that works well in practice [1], while a drawback is that the importance of correlated features may be overestimated [2].","title":"Overview"},{"location":"user_guide/evaluate/feature_importance_permutation/#references","text":"[1] Terence Parr, Kerem Turgutlu, Christopher Csiszar, and Jeremy Howard. Beware Default Random Forest Importances (http://parrt.cs.usfca.edu/doc/rf-importance/index.html) [2] Strobl, C., Boulesteix, A. L., Kneib, T., Augustin, T., & Zeileis, A. (2008). Conditional variable importance for random forests. BMC bioinformatics, 9(1), 307.","title":"References"},{"location":"user_guide/evaluate/feature_importance_permutation/#example-1-feature-importance-for-classifiers","text":"The following example illustrates the feature importance estimation via permutation importance based for classification models. import numpy as np import matplotlib.pyplot as plt from sklearn.svm import SVC from sklearn.model_selection import train_test_split from mlxtend.evaluate import feature_importance_permutation","title":"Example 1 -- Feature Importance for Classifiers"},{"location":"user_guide/evaluate/feature_importance_permutation/#generate-a-toy-dataset","text":"from sklearn.datasets import make_classification from sklearn.ensemble import RandomForestClassifier # Build a classification task using 3 informative features X, y = make_classification(n_samples=10000, n_features=10, n_informative=3, n_redundant=0, n_repeated=0, n_classes=2, random_state=0, shuffle=False) X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.3, random_state=1, stratify=y)","title":"Generate a toy dataset"},{"location":"user_guide/evaluate/feature_importance_permutation/#feature-importance-via-random-forest","text":"First, we compute the feature importance directly from the random forest via mean impurity decrease (described after the code section): forest = RandomForestClassifier(n_estimators=250, random_state=0) forest.fit(X_train, y_train) print('Training accuracy:', np.mean(forest.predict(X_train) == y_train)*100) print('Test accuracy:', np.mean(forest.predict(X_test) == y_test)*100) importance_vals = forest.feature_importances_ print(importance_vals) Training accuracy: 100.0 Test accuracy: 95.0666666667 [ 0.283357 0.30846795 0.24204291 0.02229767 0.02364941 0.02390578 0.02501543 0.0234225 0.02370816 0.0241332 ] There are several strategies for computing the feature importance in random forest. The method implemented in scikit-learn (used in the next code example) is based on the Breiman and Friedman's CART (Breiman, Friedman, \"Classification and regression trees\", 1984), the so-called mean impurity decrease . Here, the importance value of a features is computed by averaging the impurity decrease for that feature, when splitting a parent node into two child nodes, across all the trees in the ensemble. Note that the impurity decrease values are weighted by the number of samples that are in the respective nodes. This process is repeated for all features in the dataset, and the feature importance values are then normalized so that they sum up to 1. In CART, the authors also note that this fast way of computing feature importance values is relatively consistent with the permutation importance. Next, let's visualize the feature importance values from the random forest including a measure of the mean impurity decrease variability (here: standard deviation): std = np.std([tree.feature_importances_ for tree in forest.estimators_], axis=0) indices = np.argsort(importance_vals)[::-1] # Plot the feature importances of the forest plt.figure() plt.title(\"Random Forest feature importance\") plt.bar(range(X.shape[1]), importance_vals[indices], yerr=std[indices], align=\"center\") plt.xticks(range(X.shape[1]), indices) plt.xlim([-1, X.shape[1]]) plt.ylim([0, 0.5]) plt.show() As we can see, the features 1, 0, and 2 are estimated to be the most informative ones for the random forest classier. Next, let's compute the feature importance via the permutation importance approach.","title":"Feature importance via random forest"},{"location":"user_guide/evaluate/feature_importance_permutation/#permutation-importance","text":"imp_vals, _ = feature_importance_permutation( predict_method=forest.predict, X=X_test, y=y_test, metric='accuracy', num_rounds=1, seed=1) imp_vals array([ 0.26833333, 0.26733333, 0.261 , -0.002 , -0.00033333, 0.00066667, 0.00233333, 0.00066667, 0.00066667, -0.00233333]) Note that the feature_importance_permutation returns two arrays. The first array (here: imp_vals ) contains the actual importance values we are interested in. If num_rounds > 1 , the permutation is repeated multiple times (with different random seeds), and in this case the first array contains the average value of the importance computed from the different runs. The second array (here, assigned to _ , because we are not using it) then contains all individual values from these runs (more about that later). Now, let's also visualize the importance values in a barplot: indices = np.argsort(imp_vals)[::-1] plt.figure() plt.title(\"Random Forest feature importance via permutation importance\") plt.bar(range(X.shape[1]), imp_vals[indices]) plt.xticks(range(X.shape[1]), indices) plt.xlim([-1, X.shape[1]]) plt.ylim([0, 0.5]) plt.show() As we can see, also here, features 1, 0, and 2 are predicted to be the most important ones, which is consistent with the feature importance values that we computed via the mean impurity decrease method earlier. (Note that in the context of random forests, the feature importance via permutation importance is typically computed using the out-of-bag samples of a random forest, whereas in this implementation, an independent dataset is used.) Previously, it was mentioned that the permutation is repeated multiple times if num_rounds > 1 . In this case, the second array returned by the feature_importance_permutation contains the importance values for these individual runs (the array has shape [num_features, num_rounds), which we can use to compute some sort of variability between these runs. imp_vals, imp_all = feature_importance_permutation( predict_method=forest.predict, X=X_test, y=y_test, metric='accuracy', num_rounds=10, seed=1) std = np.std(imp_all, axis=1) indices = np.argsort(imp_vals)[::-1] plt.figure() plt.title(\"Random Forest feature importance via permutation importance w. std. dev.\") plt.bar(range(X.shape[1]), imp_vals[indices], yerr=std[indices]) plt.xticks(range(X.shape[1]), indices) plt.xlim([-1, X.shape[1]]) plt.show() It shall be noted that the feature importance values do not sum up to one, since they are not normalized (you can normalize them if you'd like, by dividing these by the sum of importance values). Here, the main point is to look at the importance values relative to each other and not to over-interpret the absolute values.","title":"Permutation Importance"},{"location":"user_guide/evaluate/feature_importance_permutation/#support-vector-machines","text":"While the permutation importance approach yields results that are generally consistent with the mean impurity decrease feature importance values from a random forest, it's a method that is model-agnostic and can be used with any kind of classifier or regressor. The example below applies the feature_importance_permutation function to a support vector machine: from sklearn.svm import SVC svm = SVC(C=1.0, kernel='rbf') svm.fit(X_train, y_train) print('Training accuracy', np.mean(svm.predict(X_train) == y_train)*100) print('Test accuracy', np.mean(svm.predict(X_test) == y_test)*100) Training accuracy 95.0857142857 Test accuracy 94.9666666667 imp_vals, imp_all = feature_importance_permutation( predict_method=svm.predict, X=X_test, y=y_test, metric='accuracy', num_rounds=10, seed=1) std = np.std(imp_all, axis=1) indices = np.argsort(imp_vals)[::-1] plt.figure() plt.title(\"SVM feature importance via permutation importance\") plt.bar(range(X.shape[1]), imp_vals[indices], yerr=std[indices]) plt.xticks(range(X.shape[1]), indices) plt.xlim([-1, X.shape[1]]) plt.show()","title":"Support Vector Machines"},{"location":"user_guide/evaluate/feature_importance_permutation/#example-1-feature-importance-for-regressors","text":"import numpy as np import matplotlib.pyplot as plt from mlxtend.evaluate import feature_importance_permutation from sklearn.model_selection import train_test_split from sklearn.datasets import make_regression from sklearn.svm import SVR X, y = make_regression(n_samples=1000, n_features=5, n_informative=2, n_targets=1, random_state=123, shuffle=False) X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.3, random_state=123) svm = SVR(kernel='rbf') svm.fit(X_train, y_train) imp_vals, _ = feature_importance_permutation( predict_method=svm.predict, X=X_test, y=y_test, metric='r2', num_rounds=1, seed=1) imp_vals array([ 0.43676245, 0.22231268, 0.00146906, 0.01611528, -0.00522067]) plt.figure() plt.bar(range(X.shape[1]), imp_vals) plt.xticks(range(X.shape[1])) plt.xlim([-1, X.shape[1]]) plt.ylim([0, 0.5]) plt.show()","title":"Example 1 -- Feature Importance for Regressors"},{"location":"user_guide/evaluate/feature_importance_permutation/#api","text":"feature_importance_permutation(X, y, predict_method, metric, num_rounds=1, seed=None) Feature importance imputation via permutation importance Parameters X : NumPy array, shape = [n_samples, n_features] Dataset, where n_samples is the number of samples and n_features is the number of features. y : NumPy array, shape = [n_samples] Target values. predict_method : prediction function A callable function that predicts the target values from X. metric : str, callable The metric for evaluating the feature importance through permutation. By default, the strings 'accuracy' is recommended for classifiers and the string 'r2' is recommended for regressors. Optionally, a custom scoring function (e.g., metric=scoring_func ) that accepts two arguments, y_true and y_pred, which have similar shape to the y array. num_rounds : int (default=1) Number of rounds the feature columns are permuted to compute the permutation importance. seed : int or None (default=None) Random seed for permuting the feature columns. Returns mean_importance_vals, all_importance_vals : NumPy arrays. The first array, mean_importance_vals has shape [n_features, ] and contains the importance values for all features. The shape of the second array is [n_features, num_rounds] and contains the feature importance for each repetition. If num_rounds=1, it contains the same values as the first array, mean_importance_vals. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/feature_importance_permutation/","title":"API"},{"location":"user_guide/evaluate/ftest/","text":"F-Test F-test for comparing the performance of multiple classifiers. from mlxtend.evaluate import ftest Overview In the context of evaluating machine learning models, the F-test by George W. Snedecor [1] can be regarded as analogous to Cochran's Q test that can be applied to evaluate multiple classifiers (i.e., whether their accuracies estimated on a test set differ) as described by Looney [2][3]. More formally, assume the task to test the null hypothesis that there is no difference between the classification accuracies [1]: p_i: H_0 = p_1 = p_2 = \\cdots = p_L. Let \\{D_1, \\dots , D_L\\} be a set of classifiers who have all been tested on the same dataset. If the L classifiers don't perform differently, then the F statistic is distributed according to an F distribution with (L-1 ) and (L-1)\\times N degrees of freedom, where N is the number of examples in the test set. The calculation of the F statistic consists of several components, which are listed below (adopted from [3]). Sum of squares of the classifiers: SSA = N \\sum_{i=1}^{N} (L_j)^2, where L_j is the number of classifiers out of L that correctly classified object \\mathbf{z}_j \\in \\mathbf{Z}_{N} , where \\mathbf{Z}_{N} = \\{\\mathbf{z}_1, ... \\mathbf{z}_{N}\\} is the test dataset on which the classifers are tested on. The sum of squares for the objects: SSB= \\frac{1}{L} \\sum_{j=1}^N (L_j)^2 - L\\cdot N \\cdot ACC_{avg}^2, where ACC_{avg} is the average of the accuracies of the different models ACC_{avg} = \\sum_{i=1}^L ACC_i . The total sum of squares: SST = L\\cdot N \\cdot ACC_{avg}^2 (1 - ACC_{avg}^2). The sum of squares for the classification--object interaction: SSAB = SST - SSA - SSB. The mean SSA and mean SSAB values: MSA = \\frac{SSA}{L-1}, and MSAB = \\frac{SSAB}{(L-1) (N-1)}. From the MSA and MSAB, we can then calculate the F-value as F = \\frac{MSA}{MSAB}. After computing the F-value, we can then look up the p-value from a F-distribution table for the corresponding degrees of freedom or obtain it computationally from a cumulative F-distribution function. In practice, if we successfully rejected the null hypothesis at a previously chosen significance threshold, we could perform multiple post hoc pair-wise tests -- for example, McNemar tests with a Bonferroni correction -- to determine which pairs have different population proportions. References [1] Snedecor, George W. and Cochran, William G. (1989), Statistical Methods, Eighth Edition, Iowa State University Press. [2] Looney, Stephen W. \"A statistical technique for comparing the accuracies of several classifiers.\" Pattern Recognition Letters 8, no. 1 (1988): 5-9. [3] Kuncheva, Ludmila I. Combining pattern classifiers: methods and algorithms. John Wiley & Sons, 2004. Example 1 - F-test import numpy as np from mlxtend.evaluate import ftest ## Dataset: # ground truth labels of the test dataset: y_true = np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) # predictions by 3 classifiers (`y_model_1`, `y_model_2`, and `y_model_3`): y_model_1 = np.array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) y_model_2 = np.array([1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) y_model_3 = np.array([1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1]) Assuming a significance level \\alpha=0.05 , we can conduct Cochran's Q test as follows, to test the null hypothesis there is no difference between the classification accuracies, p_i: H_0 = p_1 = p_2 = \\cdots = p_L : f, p_value = ftest(y_true, y_model_1, y_model_2, y_model_3) print('F: %.3f' % f) print('p-value: %.3f' % p_value) F: 3.873 p-value: 0.022 Since the p-value is smaller than \\alpha , we can reject the null hypothesis and conclude that there is a difference between the classification accuracies. As mentioned in the introduction earlier, we could now perform multiple post hoc pair-wise tests -- for example, McNemar tests with a Bonferroni correction -- to determine which pairs have different population proportions. API ftest(y_target, y_model_predictions)* F-Test test to compare 2 or more models. Parameters y_target : array-like, shape=[n_samples] True class labels as 1D NumPy array. *y_model_predictions : array-likes, shape=[n_samples] Variable number of 2 or more arrays that contain the predicted class labels from models as 1D NumPy array. Returns f, p : float or None, float Returns the F-value and the p-value Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/ftest/","title":"F-Test"},{"location":"user_guide/evaluate/ftest/#f-test","text":"F-test for comparing the performance of multiple classifiers. from mlxtend.evaluate import ftest","title":"F-Test"},{"location":"user_guide/evaluate/ftest/#overview","text":"In the context of evaluating machine learning models, the F-test by George W. Snedecor [1] can be regarded as analogous to Cochran's Q test that can be applied to evaluate multiple classifiers (i.e., whether their accuracies estimated on a test set differ) as described by Looney [2][3]. More formally, assume the task to test the null hypothesis that there is no difference between the classification accuracies [1]: p_i: H_0 = p_1 = p_2 = \\cdots = p_L. Let \\{D_1, \\dots , D_L\\} be a set of classifiers who have all been tested on the same dataset. If the L classifiers don't perform differently, then the F statistic is distributed according to an F distribution with (L-1 ) and (L-1)\\times N degrees of freedom, where N is the number of examples in the test set. The calculation of the F statistic consists of several components, which are listed below (adopted from [3]). Sum of squares of the classifiers: SSA = N \\sum_{i=1}^{N} (L_j)^2, where L_j is the number of classifiers out of L that correctly classified object \\mathbf{z}_j \\in \\mathbf{Z}_{N} , where \\mathbf{Z}_{N} = \\{\\mathbf{z}_1, ... \\mathbf{z}_{N}\\} is the test dataset on which the classifers are tested on. The sum of squares for the objects: SSB= \\frac{1}{L} \\sum_{j=1}^N (L_j)^2 - L\\cdot N \\cdot ACC_{avg}^2, where ACC_{avg} is the average of the accuracies of the different models ACC_{avg} = \\sum_{i=1}^L ACC_i . The total sum of squares: SST = L\\cdot N \\cdot ACC_{avg}^2 (1 - ACC_{avg}^2). The sum of squares for the classification--object interaction: SSAB = SST - SSA - SSB. The mean SSA and mean SSAB values: MSA = \\frac{SSA}{L-1}, and MSAB = \\frac{SSAB}{(L-1) (N-1)}. From the MSA and MSAB, we can then calculate the F-value as F = \\frac{MSA}{MSAB}. After computing the F-value, we can then look up the p-value from a F-distribution table for the corresponding degrees of freedom or obtain it computationally from a cumulative F-distribution function. In practice, if we successfully rejected the null hypothesis at a previously chosen significance threshold, we could perform multiple post hoc pair-wise tests -- for example, McNemar tests with a Bonferroni correction -- to determine which pairs have different population proportions.","title":"Overview"},{"location":"user_guide/evaluate/ftest/#references","text":"[1] Snedecor, George W. and Cochran, William G. (1989), Statistical Methods, Eighth Edition, Iowa State University Press. [2] Looney, Stephen W. \"A statistical technique for comparing the accuracies of several classifiers.\" Pattern Recognition Letters 8, no. 1 (1988): 5-9. [3] Kuncheva, Ludmila I. Combining pattern classifiers: methods and algorithms. John Wiley & Sons, 2004.","title":"References"},{"location":"user_guide/evaluate/ftest/#example-1-f-test","text":"import numpy as np from mlxtend.evaluate import ftest ## Dataset: # ground truth labels of the test dataset: y_true = np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) # predictions by 3 classifiers (`y_model_1`, `y_model_2`, and `y_model_3`): y_model_1 = np.array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) y_model_2 = np.array([1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) y_model_3 = np.array([1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1]) Assuming a significance level \\alpha=0.05 , we can conduct Cochran's Q test as follows, to test the null hypothesis there is no difference between the classification accuracies, p_i: H_0 = p_1 = p_2 = \\cdots = p_L : f, p_value = ftest(y_true, y_model_1, y_model_2, y_model_3) print('F: %.3f' % f) print('p-value: %.3f' % p_value) F: 3.873 p-value: 0.022 Since the p-value is smaller than \\alpha , we can reject the null hypothesis and conclude that there is a difference between the classification accuracies. As mentioned in the introduction earlier, we could now perform multiple post hoc pair-wise tests -- for example, McNemar tests with a Bonferroni correction -- to determine which pairs have different population proportions.","title":"Example 1 - F-test"},{"location":"user_guide/evaluate/ftest/#api","text":"ftest(y_target, y_model_predictions)* F-Test test to compare 2 or more models. Parameters y_target : array-like, shape=[n_samples] True class labels as 1D NumPy array. *y_model_predictions : array-likes, shape=[n_samples] Variable number of 2 or more arrays that contain the predicted class labels from models as 1D NumPy array. Returns f, p : float or None, float Returns the F-value and the p-value Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/ftest/","title":"API"},{"location":"user_guide/evaluate/lift_score/","text":"Lift Score Scoring function to compute the LIFT metric, the ratio of correctly predicted positive examples and the actual positive examples in the test dataset. from mlxtend.evaluate import lift_score Overview In the context of classification, lift [1] compares model predictions to randomly generated predictions. Lift is often used in marketing research combined with gain and lift charts as a visual aid [2]. For example, assuming a 10% customer response as a baseline, a lift value of 3 would correspond to a 30% customer response when using the predictive model. Note that lift has the range \\lbrack 0, \\infty \\rbrack . There are many strategies to compute lift , and below, we will illustrate the computation of the lift score using a classic confusion matrix. For instance, let's assume the following prediction and target labels, where \"1\" is the positive class: \\text{true labels}: [0, 0, 1, 0, 0, 1, 1, 1, 1, 1] \\text{prediction}: [1, 0, 1, 0, 0, 0, 0, 1, 0, 0] Then, our confusion matrix would look as follows: Based on the confusion matrix above, with \"1\" as positive label, we compute lift as follows: \\text{lift} = \\frac{(TP/(TP+FP)}{(TP+FN)/(TP+TN+FP+FN)} Plugging in the actual values from the example above, we arrive at the following lift value: \\frac{2/(2+1)}{(2+4)/(2+3+1+4)} = 1.1111111111111112 An alternative way to computing lift is by using the support metric [3]: \\text{lift} = \\frac{\\text{support}(\\text{true labels} \\cap \\text{prediction})}{\\text{support}(\\text{true labels}) \\times \\text{support}(\\text{prediction})}, Support is x / N , where x is the number of incidences of an observation and N is the total number of samples in the datset. \\text{true labels} \\cap \\text{prediction} are the true positives, true labels are true positives plus false negatives, and prediction are true positives plus false positives. Plugging the values from our example into the equation above, we arrive at: \\frac{2/10}{(6/10 \\times 3/10)} = 1.1111111111111112 References [1] S. Brin, R. Motwani, J. D. Ullman, and S. Tsur. Dynamic itemset counting and implication rules for market basket data . In Proc. of the ACM SIGMOD Int'l Conf. on Management of Data (ACM SIGMOD '97), pages 265-276, 1997. [2] https://www3.nd.edu/~busiforc/Lift_chart.html [3] https://en.wikipedia.org/wiki/Association_rule_learning#Support Example 1 - Computing Lift This examples demonstrates the basic use of the lift_score function using the example from the Overview section. import numpy as np from mlxtend.evaluate import lift_score y_target = np.array([0, 0, 1, 0, 0, 1, 1, 1, 1, 1]) y_predicted = np.array([1, 0, 1, 0, 0, 0, 0, 1, 0, 0]) lift_score(y_target, y_predicted) 1.1111111111111112 Example 2 - Using lift_score in GridSearch The lift_score function can also be used with scikit-learn objects, such as GridSearch : from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.model_selection import GridSearchCV from sklearn.svm import SVC from sklearn.metrics import make_scorer # make custom scorer lift_scorer = make_scorer(lift_score) iris = load_iris() X, y = iris.data, iris.target X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, stratify=y, random_state=123) hyperparameters = [{'kernel': ['rbf'], 'gamma': [1e-3, 1e-4], 'C': [1, 10, 100, 1000]}, {'kernel': ['linear'], 'C': [1, 10, 100, 1000]}] clf = GridSearchCV(SVC(), hyperparameters, cv=10, scoring=lift_scorer) clf.fit(X_train, y_train) print(clf.best_score_) print(clf.best_params_) 3.0 {'gamma': 0.001, 'kernel': 'rbf', 'C': 1000} API lift_score(y_target, y_predicted, binary=True, positive_label=1) Lift measures the degree to which the predictions of a classification model are better than randomly-generated predictions. The in terms of True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN), the lift score is computed as: [ TP/(TP+FN) ] / [ (TP+FP) / (TP+TN+FP+FN) ] Parameters y_target : array-like, shape=[n_samples] True class labels. y_predicted : array-like, shape=[n_samples] Predicted class labels. binary : bool (default: True) Maps a multi-class problem onto a binary, where the positive class is 1 and all other classes are 0. positive_label : int (default: 0) Class label of the positive class. Returns score : float Lift score in the range [0, \\infty ] Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/lift_score/","title":"Lift Score"},{"location":"user_guide/evaluate/lift_score/#lift-score","text":"Scoring function to compute the LIFT metric, the ratio of correctly predicted positive examples and the actual positive examples in the test dataset. from mlxtend.evaluate import lift_score","title":"Lift Score"},{"location":"user_guide/evaluate/lift_score/#overview","text":"In the context of classification, lift [1] compares model predictions to randomly generated predictions. Lift is often used in marketing research combined with gain and lift charts as a visual aid [2]. For example, assuming a 10% customer response as a baseline, a lift value of 3 would correspond to a 30% customer response when using the predictive model. Note that lift has the range \\lbrack 0, \\infty \\rbrack . There are many strategies to compute lift , and below, we will illustrate the computation of the lift score using a classic confusion matrix. For instance, let's assume the following prediction and target labels, where \"1\" is the positive class: \\text{true labels}: [0, 0, 1, 0, 0, 1, 1, 1, 1, 1] \\text{prediction}: [1, 0, 1, 0, 0, 0, 0, 1, 0, 0] Then, our confusion matrix would look as follows: Based on the confusion matrix above, with \"1\" as positive label, we compute lift as follows: \\text{lift} = \\frac{(TP/(TP+FP)}{(TP+FN)/(TP+TN+FP+FN)} Plugging in the actual values from the example above, we arrive at the following lift value: \\frac{2/(2+1)}{(2+4)/(2+3+1+4)} = 1.1111111111111112 An alternative way to computing lift is by using the support metric [3]: \\text{lift} = \\frac{\\text{support}(\\text{true labels} \\cap \\text{prediction})}{\\text{support}(\\text{true labels}) \\times \\text{support}(\\text{prediction})}, Support is x / N , where x is the number of incidences of an observation and N is the total number of samples in the datset. \\text{true labels} \\cap \\text{prediction} are the true positives, true labels are true positives plus false negatives, and prediction are true positives plus false positives. Plugging the values from our example into the equation above, we arrive at: \\frac{2/10}{(6/10 \\times 3/10)} = 1.1111111111111112","title":"Overview"},{"location":"user_guide/evaluate/lift_score/#references","text":"[1] S. Brin, R. Motwani, J. D. Ullman, and S. Tsur. Dynamic itemset counting and implication rules for market basket data . In Proc. of the ACM SIGMOD Int'l Conf. on Management of Data (ACM SIGMOD '97), pages 265-276, 1997. [2] https://www3.nd.edu/~busiforc/Lift_chart.html [3] https://en.wikipedia.org/wiki/Association_rule_learning#Support","title":"References"},{"location":"user_guide/evaluate/lift_score/#example-1-computing-lift","text":"This examples demonstrates the basic use of the lift_score function using the example from the Overview section. import numpy as np from mlxtend.evaluate import lift_score y_target = np.array([0, 0, 1, 0, 0, 1, 1, 1, 1, 1]) y_predicted = np.array([1, 0, 1, 0, 0, 0, 0, 1, 0, 0]) lift_score(y_target, y_predicted) 1.1111111111111112","title":"Example 1 - Computing Lift"},{"location":"user_guide/evaluate/lift_score/#example-2-using-lift_score-in-gridsearch","text":"The lift_score function can also be used with scikit-learn objects, such as GridSearch : from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.model_selection import GridSearchCV from sklearn.svm import SVC from sklearn.metrics import make_scorer # make custom scorer lift_scorer = make_scorer(lift_score) iris = load_iris() X, y = iris.data, iris.target X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, stratify=y, random_state=123) hyperparameters = [{'kernel': ['rbf'], 'gamma': [1e-3, 1e-4], 'C': [1, 10, 100, 1000]}, {'kernel': ['linear'], 'C': [1, 10, 100, 1000]}] clf = GridSearchCV(SVC(), hyperparameters, cv=10, scoring=lift_scorer) clf.fit(X_train, y_train) print(clf.best_score_) print(clf.best_params_) 3.0 {'gamma': 0.001, 'kernel': 'rbf', 'C': 1000}","title":"Example 2 - Using lift_score in GridSearch"},{"location":"user_guide/evaluate/lift_score/#api","text":"lift_score(y_target, y_predicted, binary=True, positive_label=1) Lift measures the degree to which the predictions of a classification model are better than randomly-generated predictions. The in terms of True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN), the lift score is computed as: [ TP/(TP+FN) ] / [ (TP+FP) / (TP+TN+FP+FN) ] Parameters y_target : array-like, shape=[n_samples] True class labels. y_predicted : array-like, shape=[n_samples] Predicted class labels. binary : bool (default: True) Maps a multi-class problem onto a binary, where the positive class is 1 and all other classes are 0. positive_label : int (default: 0) Class label of the positive class. Returns score : float Lift score in the range [0, \\infty ] Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/lift_score/","title":"API"},{"location":"user_guide/evaluate/mcnemar/","text":"McNemar's Test McNemar's test for paired nominal data from mlxtend.evaluate import mcnemar Overview McNemar's Test [1] (sometimes also called \"within-subjects chi-squared test\") is a statistical test for paired nominal data. In context of machine learning (or statistical) models, we can use McNemar's Test to compare the predictive accuracy of two models. McNemar's test is based on a 2 times 2 contigency table of the two model's predictions. McNemar's Test Statistic In McNemar's Test, we formulate the null hypothesis that the probabilities p(b) and p(c) are the same, or in simplified terms: None of the two models performs better than the other. Thus, the alternative hypothesis is that the performances of the two models are not equal. The McNemar test statistic (\"chi-squared\") can be computed as follows: \\chi^2 = \\frac{(b - c)^2}{(b + c)}, If the sum of cell c and b is sufficiently large, the \\chi^2 value follows a chi-squared distribution with one degree of freedom. After setting a significance threshold, e.g,. \\alpha=0.05 we can compute the p-value -- assuming that the null hypothesis is true, the p-value is the probability of observing this empirical (or a larger) chi-squared value. If the p-value is lower than our chosen significance level, we can reject the null hypothesis that the two model's performances are equal. Continuity Correction Approximately 1 year after Quinn McNemar published the McNemar Test [1], Edwards [2] proposed a continuity corrected version, which is the more commonly used variant today: \\chi^2 = \\frac{( \\mid b - c \\mid - 1)^2}{(b + c)}. Exact p-values As mentioned earlier, an exact binomial test is recommended for small sample sizes ( b + c < 25 [3]), since the chi-squared value is may not be well-approximated by the chi-squared distribution. The exact p-value can be computed as follows: p = 2 \\sum^{n}_{i=b} \\binom{n}{i} 0.5^i (1 - 0.5)^{n-i}, where n = b + c , and the factor 2 is used to compute the two-sided p-value. Example For instance, given that 2 models have a accuracy of with a 99.7% and 99.6% a 2x2 contigency table can provide further insights for model selection. In both subfigure A and B, the predictive accuracies of the two models are as follows: model 1 accuracy: 9,960 / 10,000 = 99.6% model 2 accuracy: 9,970 / 10,000 = 99.7% Now, in subfigure A, we can see that model 2 got 11 predictions right that model 1 got wrong. Vice versa, model 2 got 1 prediction right that model 2 got wrong. Thus, based on this 11:1 ratio, we may conclude that model 2 performs substantially better than model 1. However, in subfigure B, the ratio is 25:15, which is less conclusive about which model is the better one to choose. In the following coding examples, we will use these 2 scenarios A and B to illustrate McNemar's test. References [1] McNemar, Quinn, 1947. \" Note on the sampling error of the difference between correlated proportions or percentages \". Psychometrika. 12 (2): 153\u2013157. [2] Edwards AL: Note on the \u201ccorrection for continuity\u201d in testing the significance of the difference between correlated proportions. Psychometrika. 1948, 13 (3): 185-187. 10.1007/BF02289261. [3] https://en.wikipedia.org/wiki/McNemar%27s_test Example 1 - Creating 2x2 Contigency tables The mcnemar funtion expects a 2x2 contingency table as a NumPy array that is formatted as follows: Such a contigency matrix can be created by using the mcnemar_table function from mlxtend.evaluate . For example: import numpy as np from mlxtend.evaluate import mcnemar_table # The correct target (class) labels y_target = np.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1]) # Class labels predicted by model 1 y_model1 = np.array([0, 1, 0, 0, 0, 1, 1, 0, 0, 0]) # Class labels predicted by model 2 y_model2 = np.array([0, 0, 1, 1, 0, 1, 1, 0, 0, 0]) tb = mcnemar_table(y_target=y_target, y_model1=y_model1, y_model2=y_model2) print(tb) [[4 1] [2 3]] Example 2 - McNemar's Test for Scenario B No, let us continue with the example mentioned in the overview section and assume that we already computed the 2x2 contigency table: import numpy as np tb_b = np.array([[9945, 25], [15, 15]]) To test the null hypothesis that the predictive performance of two models are equal (using a significance level of \\alpha=0.05 ), we can conduct a corrected McNemar Test for computing the chi-squared and p-value as follows: from mlxtend.evaluate import mcnemar chi2, p = mcnemar(ary=tb_b, corrected=True) print('chi-squared:', chi2) print('p-value:', p) chi-squared: 2.025 p-value: 0.154728923485 Since the p-value is larger than our assumed significance threshold ( \\alpha=0.05 ), we cannot reject our null hypothesis and assume that there is no significant difference between the two predictive models. Example 3 - McNemar's Test for Scenario A In contrast to scenario B (Example 2), the sample size in scenario A is relatively small (b + c = 11 + 1 = 12) and smaller than the recommended 25 [3] to approximate the computed chi-square value by the chi-square distribution well. In this case, we need to compute the exact p-value from the binomial distribution: from mlxtend.evaluate import mcnemar import numpy as np tb_a = np.array([[9959, 11], [1, 29]]) chi2, p = mcnemar(ary=tb_a, exact=True) print('chi-squared:', chi2) print('p-value:', p) chi-squared: None p-value: 0.005859375 Assuming that we conducted this test also with a significance level of \\alpha=0.05 , we can reject the null-hypothesis that both models perform equally well on this dataset, since the p-value ( p \\approx 0.006 ) is smaller than \\alpha . API mcnemar(ary, corrected=True, exact=False) McNemar test for paired nominal data Parameters ary : array-like, shape=[2, 2] 2 x 2 contigency table (as returned by evaluate.mcnemar_table), where a: ary[0, 0]: # of samples that both models predicted correctly b: ary[0, 1]: # of samples that model 1 got right and model 2 got wrong c: ary[1, 0]: # of samples that model 2 got right and model 1 got wrong d: aryCell [1, 1]: # of samples that both models predicted incorrectly corrected : array-like, shape=[n_samples] (default: True) Uses Edward's continuity correction for chi-squared if True exact : bool, (default: False) If True , uses an exact binomial test comparing b to a binomial distribution with n = b + c and p = 0.5. It is highly recommended to use exact=True for sample sizes < 25 since chi-squared is not well-approximated by the chi-squared distribution! Returns chi2, p : float or None, float Returns the chi-squared value and the p-value; if exact=True (default: False ), chi2 is None Examples For usage examples, please see [http://rasbt.github.io/mlxtend/user_guide/evaluate/mcnemar/](http://rasbt.github.io/mlxtend/user_guide/evaluate/mcnemar/)","title":"McNemar's Test"},{"location":"user_guide/evaluate/mcnemar/#mcnemars-test","text":"McNemar's test for paired nominal data from mlxtend.evaluate import mcnemar","title":"McNemar's Test"},{"location":"user_guide/evaluate/mcnemar/#overview","text":"McNemar's Test [1] (sometimes also called \"within-subjects chi-squared test\") is a statistical test for paired nominal data. In context of machine learning (or statistical) models, we can use McNemar's Test to compare the predictive accuracy of two models. McNemar's test is based on a 2 times 2 contigency table of the two model's predictions.","title":"Overview"},{"location":"user_guide/evaluate/mcnemar/#mcnemars-test-statistic","text":"In McNemar's Test, we formulate the null hypothesis that the probabilities p(b) and p(c) are the same, or in simplified terms: None of the two models performs better than the other. Thus, the alternative hypothesis is that the performances of the two models are not equal. The McNemar test statistic (\"chi-squared\") can be computed as follows: \\chi^2 = \\frac{(b - c)^2}{(b + c)}, If the sum of cell c and b is sufficiently large, the \\chi^2 value follows a chi-squared distribution with one degree of freedom. After setting a significance threshold, e.g,. \\alpha=0.05 we can compute the p-value -- assuming that the null hypothesis is true, the p-value is the probability of observing this empirical (or a larger) chi-squared value. If the p-value is lower than our chosen significance level, we can reject the null hypothesis that the two model's performances are equal.","title":"McNemar's Test Statistic"},{"location":"user_guide/evaluate/mcnemar/#continuity-correction","text":"Approximately 1 year after Quinn McNemar published the McNemar Test [1], Edwards [2] proposed a continuity corrected version, which is the more commonly used variant today: \\chi^2 = \\frac{( \\mid b - c \\mid - 1)^2}{(b + c)}.","title":"Continuity Correction"},{"location":"user_guide/evaluate/mcnemar/#exact-p-values","text":"As mentioned earlier, an exact binomial test is recommended for small sample sizes ( b + c < 25 [3]), since the chi-squared value is may not be well-approximated by the chi-squared distribution. The exact p-value can be computed as follows: p = 2 \\sum^{n}_{i=b} \\binom{n}{i} 0.5^i (1 - 0.5)^{n-i}, where n = b + c , and the factor 2 is used to compute the two-sided p-value.","title":"Exact p-values"},{"location":"user_guide/evaluate/mcnemar/#example","text":"For instance, given that 2 models have a accuracy of with a 99.7% and 99.6% a 2x2 contigency table can provide further insights for model selection. In both subfigure A and B, the predictive accuracies of the two models are as follows: model 1 accuracy: 9,960 / 10,000 = 99.6% model 2 accuracy: 9,970 / 10,000 = 99.7% Now, in subfigure A, we can see that model 2 got 11 predictions right that model 1 got wrong. Vice versa, model 2 got 1 prediction right that model 2 got wrong. Thus, based on this 11:1 ratio, we may conclude that model 2 performs substantially better than model 1. However, in subfigure B, the ratio is 25:15, which is less conclusive about which model is the better one to choose. In the following coding examples, we will use these 2 scenarios A and B to illustrate McNemar's test.","title":"Example"},{"location":"user_guide/evaluate/mcnemar/#references","text":"[1] McNemar, Quinn, 1947. \" Note on the sampling error of the difference between correlated proportions or percentages \". Psychometrika. 12 (2): 153\u2013157. [2] Edwards AL: Note on the \u201ccorrection for continuity\u201d in testing the significance of the difference between correlated proportions. Psychometrika. 1948, 13 (3): 185-187. 10.1007/BF02289261. [3] https://en.wikipedia.org/wiki/McNemar%27s_test","title":"References"},{"location":"user_guide/evaluate/mcnemar/#example-1-creating-2x2-contigency-tables","text":"The mcnemar funtion expects a 2x2 contingency table as a NumPy array that is formatted as follows: Such a contigency matrix can be created by using the mcnemar_table function from mlxtend.evaluate . For example: import numpy as np from mlxtend.evaluate import mcnemar_table # The correct target (class) labels y_target = np.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1]) # Class labels predicted by model 1 y_model1 = np.array([0, 1, 0, 0, 0, 1, 1, 0, 0, 0]) # Class labels predicted by model 2 y_model2 = np.array([0, 0, 1, 1, 0, 1, 1, 0, 0, 0]) tb = mcnemar_table(y_target=y_target, y_model1=y_model1, y_model2=y_model2) print(tb) [[4 1] [2 3]]","title":"Example 1 - Creating 2x2 Contigency tables"},{"location":"user_guide/evaluate/mcnemar/#example-2-mcnemars-test-for-scenario-b","text":"No, let us continue with the example mentioned in the overview section and assume that we already computed the 2x2 contigency table: import numpy as np tb_b = np.array([[9945, 25], [15, 15]]) To test the null hypothesis that the predictive performance of two models are equal (using a significance level of \\alpha=0.05 ), we can conduct a corrected McNemar Test for computing the chi-squared and p-value as follows: from mlxtend.evaluate import mcnemar chi2, p = mcnemar(ary=tb_b, corrected=True) print('chi-squared:', chi2) print('p-value:', p) chi-squared: 2.025 p-value: 0.154728923485 Since the p-value is larger than our assumed significance threshold ( \\alpha=0.05 ), we cannot reject our null hypothesis and assume that there is no significant difference between the two predictive models.","title":"Example 2 - McNemar's Test for Scenario B"},{"location":"user_guide/evaluate/mcnemar/#example-3-mcnemars-test-for-scenario-a","text":"In contrast to scenario B (Example 2), the sample size in scenario A is relatively small (b + c = 11 + 1 = 12) and smaller than the recommended 25 [3] to approximate the computed chi-square value by the chi-square distribution well. In this case, we need to compute the exact p-value from the binomial distribution: from mlxtend.evaluate import mcnemar import numpy as np tb_a = np.array([[9959, 11], [1, 29]]) chi2, p = mcnemar(ary=tb_a, exact=True) print('chi-squared:', chi2) print('p-value:', p) chi-squared: None p-value: 0.005859375 Assuming that we conducted this test also with a significance level of \\alpha=0.05 , we can reject the null-hypothesis that both models perform equally well on this dataset, since the p-value ( p \\approx 0.006 ) is smaller than \\alpha .","title":"Example 3 - McNemar's Test for Scenario A"},{"location":"user_guide/evaluate/mcnemar/#api","text":"mcnemar(ary, corrected=True, exact=False) McNemar test for paired nominal data Parameters ary : array-like, shape=[2, 2] 2 x 2 contigency table (as returned by evaluate.mcnemar_table), where a: ary[0, 0]: # of samples that both models predicted correctly b: ary[0, 1]: # of samples that model 1 got right and model 2 got wrong c: ary[1, 0]: # of samples that model 2 got right and model 1 got wrong d: aryCell [1, 1]: # of samples that both models predicted incorrectly corrected : array-like, shape=[n_samples] (default: True) Uses Edward's continuity correction for chi-squared if True exact : bool, (default: False) If True , uses an exact binomial test comparing b to a binomial distribution with n = b + c and p = 0.5. It is highly recommended to use exact=True for sample sizes < 25 since chi-squared is not well-approximated by the chi-squared distribution! Returns chi2, p : float or None, float Returns the chi-squared value and the p-value; if exact=True (default: False ), chi2 is None Examples For usage examples, please see [http://rasbt.github.io/mlxtend/user_guide/evaluate/mcnemar/](http://rasbt.github.io/mlxtend/user_guide/evaluate/mcnemar/)","title":"API"},{"location":"user_guide/evaluate/mcnemar_table/","text":"Contigency Table for McNemar's Test Function to compute a 2x2 contingency table for McNemar's Test from mlxtend.evaluate import mcnemar_table Overview Contigency Table for McNemar's Test A 2x2 contigency table as being used in a McNemar's Test ( mlxtend.evaluate.mcnemar ) is a useful aid for comparing two different models. In contrast to a typical confusion matrix, this table compares two models to each other rather than showing the false positives, true positives, false negatives, and true negatives of a single model's predictions: For instance, given that 2 models have a accuracy of with a 99.7% and 99.6% a 2x2 contigency table can provide further insights for model selection. In both subfigure A and B, the predictive accuracies of the two models are as follows: model 1 accuracy: 9,960 / 10,000 = 99.6% model 2 accuracy: 9,970 / 10,000 = 99.7% Now, in subfigure A, we can see that model 2 got 11 predictions right that model 1 got wrong. Vice versa, model 2 got 1 prediction right that model 2 got wrong. Thus, based on this 11:1 ratio, we may conclude that model 2 performs substantially better than model 1. However, in subfigure B, the ratio is 25:15, which is less conclusive about which model is the better one to choose. References McNemar, Quinn, 1947. \" Note on the sampling error of the difference between correlated proportions or percentages \". Psychometrika. 12 (2): 153\u2013157. Edwards AL: Note on the \u201ccorrection for continuity\u201d in testing the significance of the difference between correlated proportions. Psychometrika. 1948, 13 (3): 185-187. 10.1007/BF02289261. https://en.wikipedia.org/wiki/McNemar%27s_test Example 2 - 2x2 Contigency Table import numpy as np from mlxtend.evaluate import mcnemar_table y_true = np.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1]) y_mod1 = np.array([0, 1, 0, 0, 0, 1, 1, 0, 0, 0]) y_mod2 = np.array([0, 0, 1, 1, 0, 1, 1, 0, 0, 0]) tb = mcnemar_table(y_target=y_true, y_model1=y_mod1, y_model2=y_mod2) tb array([[4, 1], [2, 3]]) To visualize (and better interpret) the contigency table via matplotlib, we can use the checkerboard_plot function: from mlxtend.plotting import checkerboard_plot import matplotlib.pyplot as plt brd = checkerboard_plot(tb, figsize=(3, 3), fmt='%d', col_labels=['model 2 wrong', 'model 2 right'], row_labels=['model 1 wrong', 'model 1 right']) plt.show() API mcnemar_table(y_target, y_model1, y_model2) Compute a 2x2 contigency table for McNemar's test. Parameters y_target : array-like, shape=[n_samples] True class labels as 1D NumPy array. y_model1 : array-like, shape=[n_samples] Predicted class labels from model as 1D NumPy array. y_model2 : array-like, shape=[n_samples] Predicted class labels from model 2 as 1D NumPy array. Returns tb : array-like, shape=[2, 2] 2x2 contingency table with the following contents: a: tb[0, 0]: # of samples that both models predicted correctly b: tb[0, 1]: # of samples that model 1 got right and model 2 got wrong c: tb[1, 0]: # of samples that model 2 got right and model 1 got wrong d: tb[1, 1]: # of samples that both models predicted incorrectly Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/mcnemar_table/","title":"Contigency Table for McNemar's Test"},{"location":"user_guide/evaluate/mcnemar_table/#contigency-table-for-mcnemars-test","text":"Function to compute a 2x2 contingency table for McNemar's Test from mlxtend.evaluate import mcnemar_table","title":"Contigency Table for McNemar's Test"},{"location":"user_guide/evaluate/mcnemar_table/#overview","text":"","title":"Overview"},{"location":"user_guide/evaluate/mcnemar_table/#contigency-table-for-mcnemars-test_1","text":"A 2x2 contigency table as being used in a McNemar's Test ( mlxtend.evaluate.mcnemar ) is a useful aid for comparing two different models. In contrast to a typical confusion matrix, this table compares two models to each other rather than showing the false positives, true positives, false negatives, and true negatives of a single model's predictions: For instance, given that 2 models have a accuracy of with a 99.7% and 99.6% a 2x2 contigency table can provide further insights for model selection. In both subfigure A and B, the predictive accuracies of the two models are as follows: model 1 accuracy: 9,960 / 10,000 = 99.6% model 2 accuracy: 9,970 / 10,000 = 99.7% Now, in subfigure A, we can see that model 2 got 11 predictions right that model 1 got wrong. Vice versa, model 2 got 1 prediction right that model 2 got wrong. Thus, based on this 11:1 ratio, we may conclude that model 2 performs substantially better than model 1. However, in subfigure B, the ratio is 25:15, which is less conclusive about which model is the better one to choose.","title":"Contigency Table for McNemar's Test"},{"location":"user_guide/evaluate/mcnemar_table/#references","text":"McNemar, Quinn, 1947. \" Note on the sampling error of the difference between correlated proportions or percentages \". Psychometrika. 12 (2): 153\u2013157. Edwards AL: Note on the \u201ccorrection for continuity\u201d in testing the significance of the difference between correlated proportions. Psychometrika. 1948, 13 (3): 185-187. 10.1007/BF02289261. https://en.wikipedia.org/wiki/McNemar%27s_test","title":"References"},{"location":"user_guide/evaluate/mcnemar_table/#example-2-2x2-contigency-table","text":"import numpy as np from mlxtend.evaluate import mcnemar_table y_true = np.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1]) y_mod1 = np.array([0, 1, 0, 0, 0, 1, 1, 0, 0, 0]) y_mod2 = np.array([0, 0, 1, 1, 0, 1, 1, 0, 0, 0]) tb = mcnemar_table(y_target=y_true, y_model1=y_mod1, y_model2=y_mod2) tb array([[4, 1], [2, 3]]) To visualize (and better interpret) the contigency table via matplotlib, we can use the checkerboard_plot function: from mlxtend.plotting import checkerboard_plot import matplotlib.pyplot as plt brd = checkerboard_plot(tb, figsize=(3, 3), fmt='%d', col_labels=['model 2 wrong', 'model 2 right'], row_labels=['model 1 wrong', 'model 1 right']) plt.show()","title":"Example 2 - 2x2 Contigency Table"},{"location":"user_guide/evaluate/mcnemar_table/#api","text":"mcnemar_table(y_target, y_model1, y_model2) Compute a 2x2 contigency table for McNemar's test. Parameters y_target : array-like, shape=[n_samples] True class labels as 1D NumPy array. y_model1 : array-like, shape=[n_samples] Predicted class labels from model as 1D NumPy array. y_model2 : array-like, shape=[n_samples] Predicted class labels from model 2 as 1D NumPy array. Returns tb : array-like, shape=[2, 2] 2x2 contingency table with the following contents: a: tb[0, 0]: # of samples that both models predicted correctly b: tb[0, 1]: # of samples that model 1 got right and model 2 got wrong c: tb[1, 0]: # of samples that model 2 got right and model 1 got wrong d: tb[1, 1]: # of samples that both models predicted incorrectly Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/mcnemar_table/","title":"API"},{"location":"user_guide/evaluate/mcnemar_tables/","text":"Contigency Tables for McNemar's Test and Cochran's Q Test Function to compute a 2x2 contingency tables for McNemar's Test and Cochran's Q Test from mlxtend.evaluate import mcnemar_tables Overview Contigency Tables A 2x2 contigency table as being used in a McNemar's Test ( mlxtend.evaluate.mcnemar ) is a useful aid for comparing two different models. In contrast to a typical confusion matrix, this table compares two models to each other rather than showing the false positives, true positives, false negatives, and true negatives of a single model's predictions: For instance, given that 2 models have a accuracy of with a 99.7% and 99.6% a 2x2 contigency table can provide further insights for model selection. In both subfigure A and B, the predictive accuracies of the two models are as follows: model 1 accuracy: 9,960 / 10,000 = 99.6% model 2 accuracy: 9,970 / 10,000 = 99.7% Now, in subfigure A, we can see that model 2 got 11 predictions right that model 1 got wrong. Vice versa, model 2 got 1 prediction right that model 2 got wrong. Thus, based on this 11:1 ratio, we may conclude that model 2 performs substantially better than model 1. However, in subfigure B, the ratio is 25:15, which is less conclusive about which model is the better one to choose. References McNemar, Quinn, 1947. \" Note on the sampling error of the difference between correlated proportions or percentages \". Psychometrika. 12 (2): 153\u2013157. Edwards AL: Note on the \u201ccorrection for continuity\u201d in testing the significance of the difference between correlated proportions. Psychometrika. 1948, 13 (3): 185-187. 10.1007/BF02289261. https://en.wikipedia.org/wiki/McNemar%27s_test Example 1 - Single 2x2 Contigency Table import numpy as np from mlxtend.evaluate import mcnemar_tables y_true = np.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1]) y_mod0 = np.array([0, 1, 0, 0, 0, 1, 1, 0, 0, 0]) y_mod1 = np.array([0, 0, 1, 1, 0, 1, 1, 0, 0, 0]) tb = mcnemar_tables(y_true, y_mod0, y_mod1) tb {'model_0 vs model_1': array([[ 4., 1.], [ 2., 3.]])} To visualize (and better interpret) the contigency table via matplotlib, we can use the checkerboard_plot function: from mlxtend.plotting import checkerboard_plot import matplotlib.pyplot as plt brd = checkerboard_plot(tb['model_0 vs model_1'], figsize=(3, 3), fmt='%d', col_labels=['model 2 wrong', 'model 2 right'], row_labels=['model 1 wrong', 'model 1 right']) plt.show() Example 2 - Multiple 2x2 Contigency Tables If more than two models are provided as input to the mcnemar_tables function, a 2x2 contingency table will be created for each pair of models: import numpy as np from mlxtend.evaluate import mcnemar_tables y_true = np.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1]) y_mod0 = np.array([0, 1, 0, 0, 0, 1, 1, 0, 0, 0]) y_mod1 = np.array([0, 0, 1, 1, 0, 1, 1, 0, 0, 0]) y_mod2 = np.array([0, 0, 1, 1, 0, 1, 1, 0, 1, 0]) tb = mcnemar_tables(y_true, y_mod0, y_mod1, y_mod2) for key, value in tb.items(): print(key, '\\n', value, '\\n') model_0 vs model_1 [[ 4. 1.] [ 2. 3.]] model_0 vs model_2 [[ 4. 2.] [ 2. 2.]] model_1 vs model_2 [[ 5. 1.] [ 0. 4.]] API mcnemar_tables(y_target, y_model_predictions)* Compute multiple 2x2 contigency tables for McNemar's test or Cochran's Q test. Parameters y_target : array-like, shape=[n_samples] True class labels as 1D NumPy array. y_model_predictions : array-like, shape=[n_samples] Predicted class labels for a model. Returns tables : dict Dictionary of NumPy arrays with shape=[2, 2]. Each dictionary key names the two models to be compared based on the order the models were passed as *y_model_predictions . The number of dictionary entries is equal to the number of pairwise combinations between the m models, i.e., \"m choose 2.\" For example the following target array (containing the true labels) and 3 models y_true = np.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1]) y_mod0 = np.array([0, 1, 0, 0, 0, 1, 1, 0, 0, 0]) y_mod0 = np.array([0, 0, 1, 1, 0, 1, 1, 0, 0, 0]) y_mod0 = np.array([0, 1, 1, 1, 0, 1, 0, 0, 0, 0]) would result in the following dictionary: {'model_0 vs model_1': array([[ 4., 1.], [ 2., 3.]]), 'model_0 vs model_2': array([[ 3., 0.], [ 3., 4.]]), 'model_1 vs model_2': array([[ 3., 0.], [ 2., 5.]])} Each array is structured in the following way: tb[0, 0]: # of samples that both models predicted correctly tb[0, 1]: # of samples that model a got right and model b got wrong tb[1, 0]: # of samples that model b got right and model a got wrong tb[1, 1]: # of samples that both models predicted incorrectly Examples For usage examples, please see [http://rasbt.github.io/mlxtend/user_guide/evaluate/mcnemar_tables/](http://rasbt.github.io/mlxtend/user_guide/evaluate/mcnemar_tables/)","title":"Contigency Tables for McNemar's Test and Cochran's Q Test"},{"location":"user_guide/evaluate/mcnemar_tables/#contigency-tables-for-mcnemars-test-and-cochrans-q-test","text":"Function to compute a 2x2 contingency tables for McNemar's Test and Cochran's Q Test from mlxtend.evaluate import mcnemar_tables","title":"Contigency Tables for McNemar's Test and Cochran's Q Test"},{"location":"user_guide/evaluate/mcnemar_tables/#overview","text":"","title":"Overview"},{"location":"user_guide/evaluate/mcnemar_tables/#contigency-tables","text":"A 2x2 contigency table as being used in a McNemar's Test ( mlxtend.evaluate.mcnemar ) is a useful aid for comparing two different models. In contrast to a typical confusion matrix, this table compares two models to each other rather than showing the false positives, true positives, false negatives, and true negatives of a single model's predictions: For instance, given that 2 models have a accuracy of with a 99.7% and 99.6% a 2x2 contigency table can provide further insights for model selection. In both subfigure A and B, the predictive accuracies of the two models are as follows: model 1 accuracy: 9,960 / 10,000 = 99.6% model 2 accuracy: 9,970 / 10,000 = 99.7% Now, in subfigure A, we can see that model 2 got 11 predictions right that model 1 got wrong. Vice versa, model 2 got 1 prediction right that model 2 got wrong. Thus, based on this 11:1 ratio, we may conclude that model 2 performs substantially better than model 1. However, in subfigure B, the ratio is 25:15, which is less conclusive about which model is the better one to choose.","title":"Contigency Tables"},{"location":"user_guide/evaluate/mcnemar_tables/#references","text":"McNemar, Quinn, 1947. \" Note on the sampling error of the difference between correlated proportions or percentages \". Psychometrika. 12 (2): 153\u2013157. Edwards AL: Note on the \u201ccorrection for continuity\u201d in testing the significance of the difference between correlated proportions. Psychometrika. 1948, 13 (3): 185-187. 10.1007/BF02289261. https://en.wikipedia.org/wiki/McNemar%27s_test","title":"References"},{"location":"user_guide/evaluate/mcnemar_tables/#example-1-single-2x2-contigency-table","text":"import numpy as np from mlxtend.evaluate import mcnemar_tables y_true = np.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1]) y_mod0 = np.array([0, 1, 0, 0, 0, 1, 1, 0, 0, 0]) y_mod1 = np.array([0, 0, 1, 1, 0, 1, 1, 0, 0, 0]) tb = mcnemar_tables(y_true, y_mod0, y_mod1) tb {'model_0 vs model_1': array([[ 4., 1.], [ 2., 3.]])} To visualize (and better interpret) the contigency table via matplotlib, we can use the checkerboard_plot function: from mlxtend.plotting import checkerboard_plot import matplotlib.pyplot as plt brd = checkerboard_plot(tb['model_0 vs model_1'], figsize=(3, 3), fmt='%d', col_labels=['model 2 wrong', 'model 2 right'], row_labels=['model 1 wrong', 'model 1 right']) plt.show()","title":"Example 1 - Single 2x2 Contigency Table"},{"location":"user_guide/evaluate/mcnemar_tables/#example-2-multiple-2x2-contigency-tables","text":"If more than two models are provided as input to the mcnemar_tables function, a 2x2 contingency table will be created for each pair of models: import numpy as np from mlxtend.evaluate import mcnemar_tables y_true = np.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1]) y_mod0 = np.array([0, 1, 0, 0, 0, 1, 1, 0, 0, 0]) y_mod1 = np.array([0, 0, 1, 1, 0, 1, 1, 0, 0, 0]) y_mod2 = np.array([0, 0, 1, 1, 0, 1, 1, 0, 1, 0]) tb = mcnemar_tables(y_true, y_mod0, y_mod1, y_mod2) for key, value in tb.items(): print(key, '\\n', value, '\\n') model_0 vs model_1 [[ 4. 1.] [ 2. 3.]] model_0 vs model_2 [[ 4. 2.] [ 2. 2.]] model_1 vs model_2 [[ 5. 1.] [ 0. 4.]]","title":"Example 2 - Multiple 2x2 Contigency Tables"},{"location":"user_guide/evaluate/mcnemar_tables/#api","text":"mcnemar_tables(y_target, y_model_predictions)* Compute multiple 2x2 contigency tables for McNemar's test or Cochran's Q test. Parameters y_target : array-like, shape=[n_samples] True class labels as 1D NumPy array. y_model_predictions : array-like, shape=[n_samples] Predicted class labels for a model. Returns tables : dict Dictionary of NumPy arrays with shape=[2, 2]. Each dictionary key names the two models to be compared based on the order the models were passed as *y_model_predictions . The number of dictionary entries is equal to the number of pairwise combinations between the m models, i.e., \"m choose 2.\" For example the following target array (containing the true labels) and 3 models y_true = np.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1]) y_mod0 = np.array([0, 1, 0, 0, 0, 1, 1, 0, 0, 0]) y_mod0 = np.array([0, 0, 1, 1, 0, 1, 1, 0, 0, 0]) y_mod0 = np.array([0, 1, 1, 1, 0, 1, 0, 0, 0, 0]) would result in the following dictionary: {'model_0 vs model_1': array([[ 4., 1.], [ 2., 3.]]), 'model_0 vs model_2': array([[ 3., 0.], [ 3., 4.]]), 'model_1 vs model_2': array([[ 3., 0.], [ 2., 5.]])} Each array is structured in the following way: tb[0, 0]: # of samples that both models predicted correctly tb[0, 1]: # of samples that model a got right and model b got wrong tb[1, 0]: # of samples that model b got right and model a got wrong tb[1, 1]: # of samples that both models predicted incorrectly Examples For usage examples, please see [http://rasbt.github.io/mlxtend/user_guide/evaluate/mcnemar_tables/](http://rasbt.github.io/mlxtend/user_guide/evaluate/mcnemar_tables/)","title":"API"},{"location":"user_guide/evaluate/paired_ttest_5x2cv/","text":"5x2cv paired t test 5x2cv paired t test procedure to compare the performance of two models from mlxtend.evaluate import paired_ttest_5x2cv Overview The 5x2cv paired t test is a procedure for comparing the performance of two models (classifiers or regressors) that was proposed by Dietterich [1] to address shortcomings in other methods such as the resampled paired t test (see paired_ttest_resampled ) and the k-fold cross-validated paired t test (see paired_ttest_kfold_cv ). To explain how this method works, let's consider to estimator (e.g., classifiers) A and B. Further, we have a labeled dataset D . In the common hold-out method, we typically split the dataset into 2 parts: a training and a test set. In the 5x2cv paired t test, we repeat the splitting (50% training and 50% test data) 5 times. In each of the 5 iterations, we fit A and B to the training split and evaluate their performance ( p_A and p_B ) on the test split. Then, we rotate the training and test sets (the training set becomes the test set and vice versa) compute the performance again, which results in 2 performance difference measures: p^{(1)} = p^{(1)}_A - p^{(1)}_B and p^{(2)} = p^{(2)}_A - p^{(2)}_B. Then, we estimate the estimate mean and variance of the differences: \\overline{p} = \\frac{p^{(1)} + p^{(2)}}{2} and s^2 = (p^{(1)} - \\overline{p})^2 + (p^{(2)} - \\overline{p})^2. The variance of the difference is computed for the 5 iterations and then used to compute the t statistic as follows: t = \\frac{p_1^{(1)}}{\\sqrt{(1/5) \\sum_{i=1}^{5}s_i^2}}, where p_1^{(1)} is the p_1 from the very first iteration. The t statistic, assuming that it approximately follows as t distribution with 5 degrees of freedom, under the null hypothesis that the models A and B have equal performance. Using the t statistic, the p value can be computed and compared with a previously chosen significance level, e.g., \\alpha=0.05 . If the p value is smaller than \\alpha , we reject the null hypothesis and accept that there is a significant difference in the two models. References [1] Dietterich TG (1998) Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms. Neural Comput 10:1895\u20131923. Example 1 - 5x2cv paired t test Assume we want to compare two classification algorithms, logistic regression and a decision tree algorithm: from sklearn.linear_model import LogisticRegression from sklearn.tree import DecisionTreeClassifier from mlxtend.data import iris_data from sklearn.model_selection import train_test_split X, y = iris_data() clf1 = LogisticRegression(random_state=1) clf2 = DecisionTreeClassifier(random_state=1) X_train, X_test, y_train, y_test = \\ train_test_split(X, y, test_size=0.25, random_state=123) score1 = clf1.fit(X_train, y_train).score(X_test, y_test) score2 = clf2.fit(X_train, y_train).score(X_test, y_test) print('Logistic regression accuracy: %.2f%%' % (score1*100)) print('Decision tree accuracy: %.2f%%' % (score2*100)) Logistic regression accuracy: 97.37% Decision tree accuracy: 94.74% Note that these accuracy values are not used in the paired t test procedure as new test/train splits are generated during the resampling procedure, the values above are just serving the purpose of intuition. Now, let's assume a significance threshold of \\alpha=0.05 for rejecting the null hypothesis that both algorithms perform equally well on the dataset and conduct the 5x2cv t test: from mlxtend.evaluate import paired_ttest_5x2cv t, p = paired_ttest_5x2cv(estimator1=clf1, estimator2=clf2, X=X, y=y, random_seed=1) print('t statistic: %.3f' % t) print('p value: %.3f' % p) t statistic: -1.539 p value: 0.184 Since p > \\alpha , we cannot reject the null hypothesis and may conclude that the performance of the two algorithms is not significantly different. While it is generally not recommended to apply statistical tests multiple times without correction for multiple hypothesis testing, let us take a look at an example where the decision tree algorithm is limited to producing a very simple decision boundary that would result in a relatively bad performance: clf2 = DecisionTreeClassifier(random_state=1, max_depth=1) score2 = clf2.fit(X_train, y_train).score(X_test, y_test) print('Decision tree accuracy: %.2f%%' % (score2*100)) t, p = paired_ttest_5x2cv(estimator1=clf1, estimator2=clf2, X=X, y=y, random_seed=1) print('t statistic: %.3f' % t) print('p value: %.3f' % p) Decision tree accuracy: 63.16% t statistic: 5.386 p value: 0.003 Assuming that we conducted this test also with a significance level of \\alpha=0.05 , we can reject the null-hypothesis that both models perform equally well on this dataset, since the p-value ( p < 0.001 ) is smaller than \\alpha . API paired_ttest_5x2cv(estimator1, estimator2, X, y, scoring=None, random_seed=None) Implements the 5x2cv paired t test proposed by Dieterrich (1998) to compare the performance of two models. Parameters estimator1 : scikit-learn classifier or regressor estimator2 : scikit-learn classifier or regressor X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. scoring : str, callable, or None (default: None) If None (default), uses 'accuracy' for sklearn classifiers and 'r2' for sklearn regressors. If str, uses a sklearn scoring metric string identifier, for example {accuracy, f1, precision, recall, roc_auc} for classifiers, {'mean_absolute_error', 'mean_squared_error'/'neg_mean_squared_error', 'median_absolute_error', 'r2'} for regressors. If a callable object or function is provided, it has to be conform with sklearn's signature scorer(estimator, X, y) ; see http://scikit-learn.org/stable/modules/generated/sklearn.metrics.make_scorer.html for more information. random_seed : int or None (default: None) Random seed for creating the test/train splits. Returns t : float The t-statistic pvalue : float Two-tailed p-value. If the chosen significance level is larger than the p-value, we reject the null hypothesis and accept that there are significant differences in the two compared models. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/paired_ttest_5x2cv/","title":"5x2cv paired *t* test"},{"location":"user_guide/evaluate/paired_ttest_5x2cv/#5x2cv-paired-t-test","text":"5x2cv paired t test procedure to compare the performance of two models from mlxtend.evaluate import paired_ttest_5x2cv","title":"5x2cv paired t test"},{"location":"user_guide/evaluate/paired_ttest_5x2cv/#overview","text":"The 5x2cv paired t test is a procedure for comparing the performance of two models (classifiers or regressors) that was proposed by Dietterich [1] to address shortcomings in other methods such as the resampled paired t test (see paired_ttest_resampled ) and the k-fold cross-validated paired t test (see paired_ttest_kfold_cv ). To explain how this method works, let's consider to estimator (e.g., classifiers) A and B. Further, we have a labeled dataset D . In the common hold-out method, we typically split the dataset into 2 parts: a training and a test set. In the 5x2cv paired t test, we repeat the splitting (50% training and 50% test data) 5 times. In each of the 5 iterations, we fit A and B to the training split and evaluate their performance ( p_A and p_B ) on the test split. Then, we rotate the training and test sets (the training set becomes the test set and vice versa) compute the performance again, which results in 2 performance difference measures: p^{(1)} = p^{(1)}_A - p^{(1)}_B and p^{(2)} = p^{(2)}_A - p^{(2)}_B. Then, we estimate the estimate mean and variance of the differences: \\overline{p} = \\frac{p^{(1)} + p^{(2)}}{2} and s^2 = (p^{(1)} - \\overline{p})^2 + (p^{(2)} - \\overline{p})^2. The variance of the difference is computed for the 5 iterations and then used to compute the t statistic as follows: t = \\frac{p_1^{(1)}}{\\sqrt{(1/5) \\sum_{i=1}^{5}s_i^2}}, where p_1^{(1)} is the p_1 from the very first iteration. The t statistic, assuming that it approximately follows as t distribution with 5 degrees of freedom, under the null hypothesis that the models A and B have equal performance. Using the t statistic, the p value can be computed and compared with a previously chosen significance level, e.g., \\alpha=0.05 . If the p value is smaller than \\alpha , we reject the null hypothesis and accept that there is a significant difference in the two models.","title":"Overview"},{"location":"user_guide/evaluate/paired_ttest_5x2cv/#references","text":"[1] Dietterich TG (1998) Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms. Neural Comput 10:1895\u20131923.","title":"References"},{"location":"user_guide/evaluate/paired_ttest_5x2cv/#example-1-5x2cv-paired-t-test","text":"Assume we want to compare two classification algorithms, logistic regression and a decision tree algorithm: from sklearn.linear_model import LogisticRegression from sklearn.tree import DecisionTreeClassifier from mlxtend.data import iris_data from sklearn.model_selection import train_test_split X, y = iris_data() clf1 = LogisticRegression(random_state=1) clf2 = DecisionTreeClassifier(random_state=1) X_train, X_test, y_train, y_test = \\ train_test_split(X, y, test_size=0.25, random_state=123) score1 = clf1.fit(X_train, y_train).score(X_test, y_test) score2 = clf2.fit(X_train, y_train).score(X_test, y_test) print('Logistic regression accuracy: %.2f%%' % (score1*100)) print('Decision tree accuracy: %.2f%%' % (score2*100)) Logistic regression accuracy: 97.37% Decision tree accuracy: 94.74% Note that these accuracy values are not used in the paired t test procedure as new test/train splits are generated during the resampling procedure, the values above are just serving the purpose of intuition. Now, let's assume a significance threshold of \\alpha=0.05 for rejecting the null hypothesis that both algorithms perform equally well on the dataset and conduct the 5x2cv t test: from mlxtend.evaluate import paired_ttest_5x2cv t, p = paired_ttest_5x2cv(estimator1=clf1, estimator2=clf2, X=X, y=y, random_seed=1) print('t statistic: %.3f' % t) print('p value: %.3f' % p) t statistic: -1.539 p value: 0.184 Since p > \\alpha , we cannot reject the null hypothesis and may conclude that the performance of the two algorithms is not significantly different. While it is generally not recommended to apply statistical tests multiple times without correction for multiple hypothesis testing, let us take a look at an example where the decision tree algorithm is limited to producing a very simple decision boundary that would result in a relatively bad performance: clf2 = DecisionTreeClassifier(random_state=1, max_depth=1) score2 = clf2.fit(X_train, y_train).score(X_test, y_test) print('Decision tree accuracy: %.2f%%' % (score2*100)) t, p = paired_ttest_5x2cv(estimator1=clf1, estimator2=clf2, X=X, y=y, random_seed=1) print('t statistic: %.3f' % t) print('p value: %.3f' % p) Decision tree accuracy: 63.16% t statistic: 5.386 p value: 0.003 Assuming that we conducted this test also with a significance level of \\alpha=0.05 , we can reject the null-hypothesis that both models perform equally well on this dataset, since the p-value ( p < 0.001 ) is smaller than \\alpha .","title":"Example 1 - 5x2cv paired t test"},{"location":"user_guide/evaluate/paired_ttest_5x2cv/#api","text":"paired_ttest_5x2cv(estimator1, estimator2, X, y, scoring=None, random_seed=None) Implements the 5x2cv paired t test proposed by Dieterrich (1998) to compare the performance of two models. Parameters estimator1 : scikit-learn classifier or regressor estimator2 : scikit-learn classifier or regressor X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. scoring : str, callable, or None (default: None) If None (default), uses 'accuracy' for sklearn classifiers and 'r2' for sklearn regressors. If str, uses a sklearn scoring metric string identifier, for example {accuracy, f1, precision, recall, roc_auc} for classifiers, {'mean_absolute_error', 'mean_squared_error'/'neg_mean_squared_error', 'median_absolute_error', 'r2'} for regressors. If a callable object or function is provided, it has to be conform with sklearn's signature scorer(estimator, X, y) ; see http://scikit-learn.org/stable/modules/generated/sklearn.metrics.make_scorer.html for more information. random_seed : int or None (default: None) Random seed for creating the test/train splits. Returns t : float The t-statistic pvalue : float Two-tailed p-value. If the chosen significance level is larger than the p-value, we reject the null hypothesis and accept that there are significant differences in the two compared models. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/paired_ttest_5x2cv/","title":"API"},{"location":"user_guide/evaluate/paired_ttest_kfold_cv/","text":"K-fold cross-validated paired t test K-fold paired t test procedure to compare the performance of two models from mlxtend.evaluate import paired_ttest_kfold_cv Overview K-fold cross-validated paired t-test procedure is a common method for comparing the performance of two models (classifiers or regressors) and addresses some of the drawbacks of the resampled t-test procedure ; however, this method has still the problem that the training sets overlap and is not recommended to be used in practice [1], and techniques such as the paired_ttest_5x2cv should be used instead. To explain how this method works, let's consider to estimator (e.g., classifiers) A and B. Further, we have a labeled dataset D . In the common hold-out method, we typically split the dataset into 2 parts: a training and a test set. In the k-fold cross-validated paired t-test procedure, we split the test set into k parts of equal size, and each of these parts is then used for testing while the remaining k-1 parts (joined together) are used for training a classifier or regressor (i.e., the standard k-fold cross-validation procedure). In each k-fold cross-validation iteration, we then compute the difference in performance between A and B in each so that we obtain k difference measures. Now, by making the assumption that these k differences were independently drawn and follow an approximately normal distribution, we can compute the following t statistic with k-1 degrees of freedom according to Student's t test, under the null hypothesis that the models A and B have equal performance: t = \\frac{\\overline{p} \\sqrt{k}}{\\sqrt{\\sum_{i=1}^{k}(p^{(i) - \\overline{p}})^2 / (k-1)}}. Here, p^{(i)} computes the difference between the model performances in the i th iteration, p^{(i)} = p^{(i)}_A - p^{(i)}_B , and \\overline{p} represents the average difference between the classifier performances, \\overline{p} = \\frac{1}{k} \\sum^k_{i=1} p^{(i)} . Once we computed the t statistic we can compute the p value and compare it to our chosen significance level, e.g., \\alpha=0.05 . If the p value is smaller than \\alpha , we reject the null hypothesis and accept that there is a significant difference in the two models. The problem with this method, and the reason why it is not recommended to be used in practice, is that it violates an assumption of Student's t test [1]: the difference between the model performances ( p^{(i)} = p^{(i)}_A - p^{(i)}_B ) are not normal distributed because p^{(i)}_A and p^{(i)}_B are not independent the p^{(i)} 's themselves are not independent because training sets overlap References [1] Dietterich TG (1998) Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms. Neural Comput 10:1895\u20131923. Example 1 - K-fold cross-validated paired t test Assume we want to compare two classification algorithms, logistic regression and a decision tree algorithm: from sklearn.linear_model import LogisticRegression from sklearn.tree import DecisionTreeClassifier from mlxtend.data import iris_data from sklearn.model_selection import train_test_split X, y = iris_data() clf1 = LogisticRegression(random_state=1) clf2 = DecisionTreeClassifier(random_state=1) X_train, X_test, y_train, y_test = \\ train_test_split(X, y, test_size=0.25, random_state=123) score1 = clf1.fit(X_train, y_train).score(X_test, y_test) score2 = clf2.fit(X_train, y_train).score(X_test, y_test) print('Logistic regression accuracy: %.2f%%' % (score1*100)) print('Decision tree accuracy: %.2f%%' % (score2*100)) Logistic regression accuracy: 97.37% Decision tree accuracy: 94.74% Note that these accuracy values are not used in the paired t-test procedure as new test/train splits are generated during the resampling procedure, the values above are just serving the purpose of intuition. Now, let's assume a significance threshold of \\alpha=0.05 for rejecting the null hypothesis that both algorithms perform equally well on the dataset and conduct the k-fold cross-validated t-test: from mlxtend.evaluate import paired_ttest_kfold_cv t, p = paired_ttest_kfold_cv(estimator1=clf1, estimator2=clf2, X=X, y=y, random_seed=1) print('t statistic: %.3f' % t) print('p value: %.3f' % p) t statistic: -1.861 p value: 0.096 Since p > t , we cannot reject the null hypothesis and may conclude that the performance of the two algorithms is not significantly different. While it is generally not recommended to apply statistical tests multiple times without correction for multiple hypothesis testing, let us take a look at an example where the decision tree algorithm is limited to producing a very simple decision boundary that would result in a relatively bad performance: clf2 = DecisionTreeClassifier(random_state=1, max_depth=1) score2 = clf2.fit(X_train, y_train).score(X_test, y_test) print('Decision tree accuracy: %.2f%%' % (score2*100)) t, p = paired_ttest_kfold_cv(estimator1=clf1, estimator2=clf2, X=X, y=y, random_seed=1) print('t statistic: %.3f' % t) print('p value: %.3f' % p) Decision tree accuracy: 63.16% t statistic: 13.491 p value: 0.000 Assuming that we conducted this test also with a significance level of \\alpha=0.05 , we can reject the null-hypothesis that both models perform equally well on this dataset, since the p-value ( p < 0.001 ) is smaller than \\alpha . API paired_ttest_kfold_cv(estimator1, estimator2, X, y, cv=10, scoring=None, shuffle=False, random_seed=None) Implements the k-fold paired t test procedure to compare the performance of two models. Parameters estimator1 : scikit-learn classifier or regressor estimator2 : scikit-learn classifier or regressor X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. cv : int (default: 10) Number of splits and iteration for the cross-validation procedure scoring : str, callable, or None (default: None) If None (default), uses 'accuracy' for sklearn classifiers and 'r2' for sklearn regressors. If str, uses a sklearn scoring metric string identifier, for example {accuracy, f1, precision, recall, roc_auc} for classifiers, {'mean_absolute_error', 'mean_squared_error'/'neg_mean_squared_error', 'median_absolute_error', 'r2'} for regressors. If a callable object or function is provided, it has to be conform with sklearn's signature scorer(estimator, X, y) ; see http://scikit-learn.org/stable/modules/generated/sklearn.metrics.make_scorer.html for more information. shuffle : bool (default: True) Whether to shuffle the dataset for generating the k-fold splits. random_seed : int or None (default: None) Random seed for shuffling the dataset for generating the k-fold splits. Ignored if shuffle=False. Returns t : float The t-statistic pvalue : float Two-tailed p-value. If the chosen significance level is larger than the p-value, we reject the null hypothesis and accept that there are significant differences in the two compared models. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/paired_ttest_kfold_cv/","title":"K-fold cross-validated paired *t* test"},{"location":"user_guide/evaluate/paired_ttest_kfold_cv/#k-fold-cross-validated-paired-t-test","text":"K-fold paired t test procedure to compare the performance of two models from mlxtend.evaluate import paired_ttest_kfold_cv","title":"K-fold cross-validated paired t test"},{"location":"user_guide/evaluate/paired_ttest_kfold_cv/#overview","text":"K-fold cross-validated paired t-test procedure is a common method for comparing the performance of two models (classifiers or regressors) and addresses some of the drawbacks of the resampled t-test procedure ; however, this method has still the problem that the training sets overlap and is not recommended to be used in practice [1], and techniques such as the paired_ttest_5x2cv should be used instead. To explain how this method works, let's consider to estimator (e.g., classifiers) A and B. Further, we have a labeled dataset D . In the common hold-out method, we typically split the dataset into 2 parts: a training and a test set. In the k-fold cross-validated paired t-test procedure, we split the test set into k parts of equal size, and each of these parts is then used for testing while the remaining k-1 parts (joined together) are used for training a classifier or regressor (i.e., the standard k-fold cross-validation procedure). In each k-fold cross-validation iteration, we then compute the difference in performance between A and B in each so that we obtain k difference measures. Now, by making the assumption that these k differences were independently drawn and follow an approximately normal distribution, we can compute the following t statistic with k-1 degrees of freedom according to Student's t test, under the null hypothesis that the models A and B have equal performance: t = \\frac{\\overline{p} \\sqrt{k}}{\\sqrt{\\sum_{i=1}^{k}(p^{(i) - \\overline{p}})^2 / (k-1)}}. Here, p^{(i)} computes the difference between the model performances in the i th iteration, p^{(i)} = p^{(i)}_A - p^{(i)}_B , and \\overline{p} represents the average difference between the classifier performances, \\overline{p} = \\frac{1}{k} \\sum^k_{i=1} p^{(i)} . Once we computed the t statistic we can compute the p value and compare it to our chosen significance level, e.g., \\alpha=0.05 . If the p value is smaller than \\alpha , we reject the null hypothesis and accept that there is a significant difference in the two models. The problem with this method, and the reason why it is not recommended to be used in practice, is that it violates an assumption of Student's t test [1]: the difference between the model performances ( p^{(i)} = p^{(i)}_A - p^{(i)}_B ) are not normal distributed because p^{(i)}_A and p^{(i)}_B are not independent the p^{(i)} 's themselves are not independent because training sets overlap","title":"Overview"},{"location":"user_guide/evaluate/paired_ttest_kfold_cv/#references","text":"[1] Dietterich TG (1998) Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms. Neural Comput 10:1895\u20131923.","title":"References"},{"location":"user_guide/evaluate/paired_ttest_kfold_cv/#example-1-k-fold-cross-validated-paired-t-test","text":"Assume we want to compare two classification algorithms, logistic regression and a decision tree algorithm: from sklearn.linear_model import LogisticRegression from sklearn.tree import DecisionTreeClassifier from mlxtend.data import iris_data from sklearn.model_selection import train_test_split X, y = iris_data() clf1 = LogisticRegression(random_state=1) clf2 = DecisionTreeClassifier(random_state=1) X_train, X_test, y_train, y_test = \\ train_test_split(X, y, test_size=0.25, random_state=123) score1 = clf1.fit(X_train, y_train).score(X_test, y_test) score2 = clf2.fit(X_train, y_train).score(X_test, y_test) print('Logistic regression accuracy: %.2f%%' % (score1*100)) print('Decision tree accuracy: %.2f%%' % (score2*100)) Logistic regression accuracy: 97.37% Decision tree accuracy: 94.74% Note that these accuracy values are not used in the paired t-test procedure as new test/train splits are generated during the resampling procedure, the values above are just serving the purpose of intuition. Now, let's assume a significance threshold of \\alpha=0.05 for rejecting the null hypothesis that both algorithms perform equally well on the dataset and conduct the k-fold cross-validated t-test: from mlxtend.evaluate import paired_ttest_kfold_cv t, p = paired_ttest_kfold_cv(estimator1=clf1, estimator2=clf2, X=X, y=y, random_seed=1) print('t statistic: %.3f' % t) print('p value: %.3f' % p) t statistic: -1.861 p value: 0.096 Since p > t , we cannot reject the null hypothesis and may conclude that the performance of the two algorithms is not significantly different. While it is generally not recommended to apply statistical tests multiple times without correction for multiple hypothesis testing, let us take a look at an example where the decision tree algorithm is limited to producing a very simple decision boundary that would result in a relatively bad performance: clf2 = DecisionTreeClassifier(random_state=1, max_depth=1) score2 = clf2.fit(X_train, y_train).score(X_test, y_test) print('Decision tree accuracy: %.2f%%' % (score2*100)) t, p = paired_ttest_kfold_cv(estimator1=clf1, estimator2=clf2, X=X, y=y, random_seed=1) print('t statistic: %.3f' % t) print('p value: %.3f' % p) Decision tree accuracy: 63.16% t statistic: 13.491 p value: 0.000 Assuming that we conducted this test also with a significance level of \\alpha=0.05 , we can reject the null-hypothesis that both models perform equally well on this dataset, since the p-value ( p < 0.001 ) is smaller than \\alpha .","title":"Example 1 - K-fold cross-validated paired t test"},{"location":"user_guide/evaluate/paired_ttest_kfold_cv/#api","text":"paired_ttest_kfold_cv(estimator1, estimator2, X, y, cv=10, scoring=None, shuffle=False, random_seed=None) Implements the k-fold paired t test procedure to compare the performance of two models. Parameters estimator1 : scikit-learn classifier or regressor estimator2 : scikit-learn classifier or regressor X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. cv : int (default: 10) Number of splits and iteration for the cross-validation procedure scoring : str, callable, or None (default: None) If None (default), uses 'accuracy' for sklearn classifiers and 'r2' for sklearn regressors. If str, uses a sklearn scoring metric string identifier, for example {accuracy, f1, precision, recall, roc_auc} for classifiers, {'mean_absolute_error', 'mean_squared_error'/'neg_mean_squared_error', 'median_absolute_error', 'r2'} for regressors. If a callable object or function is provided, it has to be conform with sklearn's signature scorer(estimator, X, y) ; see http://scikit-learn.org/stable/modules/generated/sklearn.metrics.make_scorer.html for more information. shuffle : bool (default: True) Whether to shuffle the dataset for generating the k-fold splits. random_seed : int or None (default: None) Random seed for shuffling the dataset for generating the k-fold splits. Ignored if shuffle=False. Returns t : float The t-statistic pvalue : float Two-tailed p-value. If the chosen significance level is larger than the p-value, we reject the null hypothesis and accept that there are significant differences in the two compared models. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/paired_ttest_kfold_cv/","title":"API"},{"location":"user_guide/evaluate/paired_ttest_resampled/","text":"Resampled paired t test Resampled paired t test procedure to compare the performance of two models from mlxtend.evaluate import paired_ttest_resample Overview Resampled paired t test procedure (also called k-hold-out paired t test) is a popular method for comparing the performance of two models (classifiers or regressors); however, this method has many drawbacks and is not recommended to be used in practice [1], and techniques such as the paired_ttest_5x2cv should be used instead. To explain how this method works, let's consider to estimator (e.g., classifiers) A and B. Further, we have a labeled dataset D . In the common hold-out method, we typically split the dataset into 2 parts: a training and a test set. In the resampled paired t test procedure, we repeat this splitting procedure (with typically 2/3 training data and 1/3 test data) k times (usually 30). In each iteration, we train A and B on the training set and evaluate it on the test set. Then, we compute the difference in performance between A and B in each iteration so that we obtain k difference measures. Now, by making the assumption that these k differences were independently drawn and follow an approximately normal distribution, we can compute the following t statistic with k-1 degrees of freedom according to Student's t test, under the null hypothesis that the models A and B have equal performance: t = \\frac{\\overline{p} \\sqrt{k}}{\\sqrt{\\sum_{i=1}^{k}(p^{(i) - \\overline{p}})^2 / (k-1)}}. Here, p^{(i)} computes the difference between the model performances in the i th iteration, p^{(i)} = p^{(i)}_A - p^{(i)}_B , and \\overline{p} represents the average difference between the classifier performances, \\overline{p} = \\frac{1}{k} \\sum^k_{i=1} p^{(i)} . Once we computed the t statistic we can compute the p value and compare it to our chosen significance level, e.g., \\alpha=0.05 . If the p value is smaller than \\alpha , we reject the null hypothesis and accept that there is a significant difference in the two models. To summarize the procedure: i := 0 while i < k: split dataset into training and test subsets fit models A and B to the training set compute the performances of A and B on the test set record the performance difference between A and B i := i + 1 compute t-statistic compute p value from t-statistic with k-1 degrees of freedom compare p value to chosen significance threshold The problem with this method, and the reason why it is not recommended to be used in practice, is that it violates the assumptions of Student's t test [1]: the difference between the model performances ( p^{(i)} = p^{(i)}_A - p^{(i)}_B ) are not normal distributed because p^{(i)}_A and p^{(i)}_B are not independent the p^{(i)} 's themselves are not independent because of the overlapping test sets; also, test and training sets overlap as well References [1] Dietterich TG (1998) Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms. Neural Comput 10:1895\u20131923. Example 1 - Resampled paired t test Assume we want to compare two classification algorithms, logistic regression and a decision tree algorithm: from sklearn.linear_model import LogisticRegression from sklearn.tree import DecisionTreeClassifier from mlxtend.data import iris_data from sklearn.model_selection import train_test_split X, y = iris_data() clf1 = LogisticRegression(random_state=1) clf2 = DecisionTreeClassifier(random_state=1) X_train, X_test, y_train, y_test = \\ train_test_split(X, y, test_size=0.25, random_state=123) score1 = clf1.fit(X_train, y_train).score(X_test, y_test) score2 = clf2.fit(X_train, y_train).score(X_test, y_test) print('Logistic regression accuracy: %.2f%%' % (score1*100)) print('Decision tree accuracy: %.2f%%' % (score2*100)) Logistic regression accuracy: 97.37% Decision tree accuracy: 94.74% Note that these accuracy values are not used in the paired t test procedure as new test/train splits are generated during the resampling procedure, the values above are just serving the purpose of intuition. Now, let's assume a significance threshold of \\alpha=0.05 for rejecting the null hypothesis that both algorithms perform equally well on the dataset and conduct the paired sample t test: from mlxtend.evaluate import paired_ttest_resampled t, p = paired_ttest_resampled(estimator1=clf1, estimator2=clf2, X=X, y=y, random_seed=1) print('t statistic: %.3f' % t) print('p value: %.3f' % p) t statistic: -1.809 p value: 0.081 Since p > t , we cannot reject the null hypothesis and may conclude that the performance of the two algorithms is not significantly different. While it is generally not recommended to apply statistical tests multiple times without correction for multiple hypothesis testing, let us take a look at an example where the decision tree algorithm is limited to producing a very simple decision boundary that would result in a relatively bad performance: clf2 = DecisionTreeClassifier(random_state=1, max_depth=1) score2 = clf2.fit(X_train, y_train).score(X_test, y_test) print('Decision tree accuracy: %.2f%%' % (score2*100)) t, p = paired_ttest_resampled(estimator1=clf1, estimator2=clf2, X=X, y=y, random_seed=1) print('t statistic: %.3f' % t) print('p value: %.3f' % p) Decision tree accuracy: 63.16% t statistic: 39.214 p value: 0.000 Assuming that we conducted this test also with a significance level of \\alpha=0.05 , we can reject the null-hypothesis that both models perform equally well on this dataset, since the p-value ( p < 0.001 ) is smaller than \\alpha . API paired_ttest_resampled(estimator1, estimator2, X, y, num_rounds=30, test_size=0.3, scoring=None, random_seed=None) Implements the resampled paired t test procedure to compare the performance of two models (also called k-hold-out paired t test). Parameters estimator1 : scikit-learn classifier or regressor estimator2 : scikit-learn classifier or regressor X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. num_rounds : int (default: 30) Number of resampling iterations (i.e., train/test splits) test_size : float or int (default: 0.3) If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to use as a test set. If int, represents the absolute number of test exsamples. scoring : str, callable, or None (default: None) If None (default), uses 'accuracy' for sklearn classifiers and 'r2' for sklearn regressors. If str, uses a sklearn scoring metric string identifier, for example {accuracy, f1, precision, recall, roc_auc} for classifiers, {'mean_absolute_error', 'mean_squared_error'/'neg_mean_squared_error', 'median_absolute_error', 'r2'} for regressors. If a callable object or function is provided, it has to be conform with sklearn's signature scorer(estimator, X, y) ; see http://scikit-learn.org/stable/modules/generated/sklearn.metrics.make_scorer.html for more information. random_seed : int or None (default: None) Random seed for creating the test/train splits. Returns t : float The t-statistic pvalue : float Two-tailed p-value. If the chosen significance level is larger than the p-value, we reject the null hypothesis and accept that there are significant differences in the two compared models. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/paired_ttest_resampled/","title":"Resampled paired *t* test"},{"location":"user_guide/evaluate/paired_ttest_resampled/#resampled-paired-t-test","text":"Resampled paired t test procedure to compare the performance of two models from mlxtend.evaluate import paired_ttest_resample","title":"Resampled paired t test"},{"location":"user_guide/evaluate/paired_ttest_resampled/#overview","text":"Resampled paired t test procedure (also called k-hold-out paired t test) is a popular method for comparing the performance of two models (classifiers or regressors); however, this method has many drawbacks and is not recommended to be used in practice [1], and techniques such as the paired_ttest_5x2cv should be used instead. To explain how this method works, let's consider to estimator (e.g., classifiers) A and B. Further, we have a labeled dataset D . In the common hold-out method, we typically split the dataset into 2 parts: a training and a test set. In the resampled paired t test procedure, we repeat this splitting procedure (with typically 2/3 training data and 1/3 test data) k times (usually 30). In each iteration, we train A and B on the training set and evaluate it on the test set. Then, we compute the difference in performance between A and B in each iteration so that we obtain k difference measures. Now, by making the assumption that these k differences were independently drawn and follow an approximately normal distribution, we can compute the following t statistic with k-1 degrees of freedom according to Student's t test, under the null hypothesis that the models A and B have equal performance: t = \\frac{\\overline{p} \\sqrt{k}}{\\sqrt{\\sum_{i=1}^{k}(p^{(i) - \\overline{p}})^2 / (k-1)}}. Here, p^{(i)} computes the difference between the model performances in the i th iteration, p^{(i)} = p^{(i)}_A - p^{(i)}_B , and \\overline{p} represents the average difference between the classifier performances, \\overline{p} = \\frac{1}{k} \\sum^k_{i=1} p^{(i)} . Once we computed the t statistic we can compute the p value and compare it to our chosen significance level, e.g., \\alpha=0.05 . If the p value is smaller than \\alpha , we reject the null hypothesis and accept that there is a significant difference in the two models. To summarize the procedure: i := 0 while i < k: split dataset into training and test subsets fit models A and B to the training set compute the performances of A and B on the test set record the performance difference between A and B i := i + 1 compute t-statistic compute p value from t-statistic with k-1 degrees of freedom compare p value to chosen significance threshold The problem with this method, and the reason why it is not recommended to be used in practice, is that it violates the assumptions of Student's t test [1]: the difference between the model performances ( p^{(i)} = p^{(i)}_A - p^{(i)}_B ) are not normal distributed because p^{(i)}_A and p^{(i)}_B are not independent the p^{(i)} 's themselves are not independent because of the overlapping test sets; also, test and training sets overlap as well","title":"Overview"},{"location":"user_guide/evaluate/paired_ttest_resampled/#references","text":"[1] Dietterich TG (1998) Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms. Neural Comput 10:1895\u20131923.","title":"References"},{"location":"user_guide/evaluate/paired_ttest_resampled/#example-1-resampled-paired-t-test","text":"Assume we want to compare two classification algorithms, logistic regression and a decision tree algorithm: from sklearn.linear_model import LogisticRegression from sklearn.tree import DecisionTreeClassifier from mlxtend.data import iris_data from sklearn.model_selection import train_test_split X, y = iris_data() clf1 = LogisticRegression(random_state=1) clf2 = DecisionTreeClassifier(random_state=1) X_train, X_test, y_train, y_test = \\ train_test_split(X, y, test_size=0.25, random_state=123) score1 = clf1.fit(X_train, y_train).score(X_test, y_test) score2 = clf2.fit(X_train, y_train).score(X_test, y_test) print('Logistic regression accuracy: %.2f%%' % (score1*100)) print('Decision tree accuracy: %.2f%%' % (score2*100)) Logistic regression accuracy: 97.37% Decision tree accuracy: 94.74% Note that these accuracy values are not used in the paired t test procedure as new test/train splits are generated during the resampling procedure, the values above are just serving the purpose of intuition. Now, let's assume a significance threshold of \\alpha=0.05 for rejecting the null hypothesis that both algorithms perform equally well on the dataset and conduct the paired sample t test: from mlxtend.evaluate import paired_ttest_resampled t, p = paired_ttest_resampled(estimator1=clf1, estimator2=clf2, X=X, y=y, random_seed=1) print('t statistic: %.3f' % t) print('p value: %.3f' % p) t statistic: -1.809 p value: 0.081 Since p > t , we cannot reject the null hypothesis and may conclude that the performance of the two algorithms is not significantly different. While it is generally not recommended to apply statistical tests multiple times without correction for multiple hypothesis testing, let us take a look at an example where the decision tree algorithm is limited to producing a very simple decision boundary that would result in a relatively bad performance: clf2 = DecisionTreeClassifier(random_state=1, max_depth=1) score2 = clf2.fit(X_train, y_train).score(X_test, y_test) print('Decision tree accuracy: %.2f%%' % (score2*100)) t, p = paired_ttest_resampled(estimator1=clf1, estimator2=clf2, X=X, y=y, random_seed=1) print('t statistic: %.3f' % t) print('p value: %.3f' % p) Decision tree accuracy: 63.16% t statistic: 39.214 p value: 0.000 Assuming that we conducted this test also with a significance level of \\alpha=0.05 , we can reject the null-hypothesis that both models perform equally well on this dataset, since the p-value ( p < 0.001 ) is smaller than \\alpha .","title":"Example 1 - Resampled paired t test"},{"location":"user_guide/evaluate/paired_ttest_resampled/#api","text":"paired_ttest_resampled(estimator1, estimator2, X, y, num_rounds=30, test_size=0.3, scoring=None, random_seed=None) Implements the resampled paired t test procedure to compare the performance of two models (also called k-hold-out paired t test). Parameters estimator1 : scikit-learn classifier or regressor estimator2 : scikit-learn classifier or regressor X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. num_rounds : int (default: 30) Number of resampling iterations (i.e., train/test splits) test_size : float or int (default: 0.3) If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to use as a test set. If int, represents the absolute number of test exsamples. scoring : str, callable, or None (default: None) If None (default), uses 'accuracy' for sklearn classifiers and 'r2' for sklearn regressors. If str, uses a sklearn scoring metric string identifier, for example {accuracy, f1, precision, recall, roc_auc} for classifiers, {'mean_absolute_error', 'mean_squared_error'/'neg_mean_squared_error', 'median_absolute_error', 'r2'} for regressors. If a callable object or function is provided, it has to be conform with sklearn's signature scorer(estimator, X, y) ; see http://scikit-learn.org/stable/modules/generated/sklearn.metrics.make_scorer.html for more information. random_seed : int or None (default: None) Random seed for creating the test/train splits. Returns t : float The t-statistic pvalue : float Two-tailed p-value. If the chosen significance level is larger than the p-value, we reject the null hypothesis and accept that there are significant differences in the two compared models. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/paired_ttest_resampled/","title":"API"},{"location":"user_guide/evaluate/permutation_test/","text":"Permutation Test An implementation of a permutation test for hypothesis testing -- testing the null hypothesis that two different groups come from the same distribution. from mlxtend.evaluate import permutation_test Overview Permutation tests (also called exact tests, randomization tests, or re-randomization tests) are nonparametric test procedures to test the null hypothesis that two different groups come from the same distribution. A permutation test can be used for significance or hypothesis testing (including A/B testing) without requiring to make any assumptions about the sampling distribution (e.g., it doesn't require the samples to be normal distributed). Under the null hypothesis (treatment = control), any permutations are equally likely. (Note that there are (n+m)! permutations, where n is the number of records in the treatment sample, and m is the number of records in the control sample). For a two-sided test, we define the alternative hypothesis that the two samples are different (e.g., treatment != control). Compute the difference (here: mean) of sample x and sample y Combine all measurements into a single dataset Draw a permuted dataset from all possible permutations of the dataset in 2. Divide the permuted dataset into two datasets x' and y' of size n and m , respectively Compute the difference (here: mean) of sample x' and sample y' and record this difference Repeat steps 3-5 until all permutations are evaluated Return the p-value as the number of times the recorded differences were more extreme than the original difference from 1. and divide this number by the total number of permutations Here, the p-value is defined as the probability, given the null hypothesis (no difference between the samples) is true, that we obtain results that are at least as extreme as the results we observed (i.e., the sample difference from 1.). More formally, we can express the computation of the p-value as follows ([2]): p(t > t_0) = \\frac{1}{(n+m)!} \\sum^{(n+m)!}_{j=1} I(t_j > t_0), where t_0 is the observed value of the test statistic (1. in the list above), and t is the t-value, the statistic computed from the resamples (5.) t(x'_1, x'_2, ..., x'_n, y'_1, y'_2, ..., x'_m) = |\\bar{x'} - \\bar{y'}| , and I is the indicator function. Given a significance level that we specify prior to carrying out the permutation test (e.g., alpha=0.05), we fail to reject the null hypothesis if the p-value is greater than alpha. Note that if the number of permutation is large, sampling all permutation may not computationally be feasible. Thus, a common approximation is to perfom k rounds of permutations (where k is typically a value between 1000 and 2000). References [1] Efron, Bradley and Tibshirani, R. J., An introduction to the bootstrap, Chapman & Hall/CRC Monographs on Statistics & Applied Probability, 1994. [2] Unpingco, Jos\u00e9. Python for probability, statistics, and machine learning. Springer, 2016. [3] Pitman, E. J. G., Significance tests which may be applied to samples from any population, Royal Statistical Society Supplement, 1937, 4: 119-30 and 225-32 Example 1 -- Two-sided permutation test Perform a two-sided permutation test to test the null hypothesis that two groups, \"treatment\" and \"control\" come from the same distribution. We specify alpha=0.01 as our significance level. treatment = [ 28.44, 29.32, 31.22, 29.58, 30.34, 28.76, 29.21, 30.4 , 31.12, 31.78, 27.58, 31.57, 30.73, 30.43, 30.31, 30.32, 29.18, 29.52, 29.22, 30.56] control = [ 33.51, 30.63, 32.38, 32.52, 29.41, 30.93, 49.78, 28.96, 35.77, 31.42, 30.76, 30.6 , 23.64, 30.54, 47.78, 31.98, 34.52, 32.42, 31.32, 40.72] Since evaluating all possible permutations may take a while, we will use the approximation method (see the introduction for details): from mlxtend.evaluate import permutation_test p_value = permutation_test(treatment, control, method='approximate', num_rounds=10000, seed=0) print(p_value) 0.0066 Since p-value < alpha, we can reject the null hypothesis that the two samples come from the same distribution. Example 2 -- Calculating the p-value for correlation analysis (Pearson's R) Note: this is a one-sided hypothesis testing as we conduct the permutation test as \"how many times obtain a correlation coefficient that is greater than the observed value?\" import numpy as np from mlxtend.evaluate import permutation_test x = np.array([1, 2, 3, 4, 5, 6]) y = np.array([2, 4, 1, 5, 6, 7]) print('Observed pearson R: %.2f' % np.corrcoef(x, y)[1][0]) p_value = permutation_test(x, y, method='exact', func=lambda x, y: np.corrcoef(x, y)[1][0], seed=0) print('P value: %.2f' % p_value) Observed pearson R: 0.81 P value: 0.09 API permutation_test(x, y, func='x_mean != y_mean', method='exact', num_rounds=1000, seed=None) Nonparametric permutation test Parameters x : list or numpy array with shape (n_datapoints,) A list or 1D numpy array of the first sample (e.g., the treatment group). y : list or numpy array with shape (n_datapoints,) A list or 1D numpy array of the second sample (e.g., the control group). func : custom function or str (default: 'x_mean != y_mean') function to compute the statistic for the permutation test. - If 'x_mean != y_mean', uses func=lambda x, y: np.abs(np.mean(x) - np.mean(y))) for a two-sided test. - If 'x_mean > y_mean', uses func=lambda x, y: np.mean(x) - np.mean(y)) for a one-sided test. - If 'x_mean < y_mean', uses func=lambda x, y: np.mean(y) - np.mean(x)) for a one-sided test. method : 'approximate' or 'exact' (default: 'exact') If 'exact' (default), all possible permutations are considered. If 'approximate' the number of drawn samples is given by num_rounds . Note that 'exact' is typically not feasible unless the dataset size is relatively small. num_rounds : int (default: 1000) The number of permutation samples if method='approximate' . seed : int or None (default: None) The random seed for generating permutation samples if method='approximate' . Returns p-value under the null hypothesis Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/permutation_test/","title":"Permutation Test"},{"location":"user_guide/evaluate/permutation_test/#permutation-test","text":"An implementation of a permutation test for hypothesis testing -- testing the null hypothesis that two different groups come from the same distribution. from mlxtend.evaluate import permutation_test","title":"Permutation Test"},{"location":"user_guide/evaluate/permutation_test/#overview","text":"Permutation tests (also called exact tests, randomization tests, or re-randomization tests) are nonparametric test procedures to test the null hypothesis that two different groups come from the same distribution. A permutation test can be used for significance or hypothesis testing (including A/B testing) without requiring to make any assumptions about the sampling distribution (e.g., it doesn't require the samples to be normal distributed). Under the null hypothesis (treatment = control), any permutations are equally likely. (Note that there are (n+m)! permutations, where n is the number of records in the treatment sample, and m is the number of records in the control sample). For a two-sided test, we define the alternative hypothesis that the two samples are different (e.g., treatment != control). Compute the difference (here: mean) of sample x and sample y Combine all measurements into a single dataset Draw a permuted dataset from all possible permutations of the dataset in 2. Divide the permuted dataset into two datasets x' and y' of size n and m , respectively Compute the difference (here: mean) of sample x' and sample y' and record this difference Repeat steps 3-5 until all permutations are evaluated Return the p-value as the number of times the recorded differences were more extreme than the original difference from 1. and divide this number by the total number of permutations Here, the p-value is defined as the probability, given the null hypothesis (no difference between the samples) is true, that we obtain results that are at least as extreme as the results we observed (i.e., the sample difference from 1.). More formally, we can express the computation of the p-value as follows ([2]): p(t > t_0) = \\frac{1}{(n+m)!} \\sum^{(n+m)!}_{j=1} I(t_j > t_0), where t_0 is the observed value of the test statistic (1. in the list above), and t is the t-value, the statistic computed from the resamples (5.) t(x'_1, x'_2, ..., x'_n, y'_1, y'_2, ..., x'_m) = |\\bar{x'} - \\bar{y'}| , and I is the indicator function. Given a significance level that we specify prior to carrying out the permutation test (e.g., alpha=0.05), we fail to reject the null hypothesis if the p-value is greater than alpha. Note that if the number of permutation is large, sampling all permutation may not computationally be feasible. Thus, a common approximation is to perfom k rounds of permutations (where k is typically a value between 1000 and 2000).","title":"Overview"},{"location":"user_guide/evaluate/permutation_test/#references","text":"[1] Efron, Bradley and Tibshirani, R. J., An introduction to the bootstrap, Chapman & Hall/CRC Monographs on Statistics & Applied Probability, 1994. [2] Unpingco, Jos\u00e9. Python for probability, statistics, and machine learning. Springer, 2016. [3] Pitman, E. J. G., Significance tests which may be applied to samples from any population, Royal Statistical Society Supplement, 1937, 4: 119-30 and 225-32","title":"References"},{"location":"user_guide/evaluate/permutation_test/#example-1-two-sided-permutation-test","text":"Perform a two-sided permutation test to test the null hypothesis that two groups, \"treatment\" and \"control\" come from the same distribution. We specify alpha=0.01 as our significance level. treatment = [ 28.44, 29.32, 31.22, 29.58, 30.34, 28.76, 29.21, 30.4 , 31.12, 31.78, 27.58, 31.57, 30.73, 30.43, 30.31, 30.32, 29.18, 29.52, 29.22, 30.56] control = [ 33.51, 30.63, 32.38, 32.52, 29.41, 30.93, 49.78, 28.96, 35.77, 31.42, 30.76, 30.6 , 23.64, 30.54, 47.78, 31.98, 34.52, 32.42, 31.32, 40.72] Since evaluating all possible permutations may take a while, we will use the approximation method (see the introduction for details): from mlxtend.evaluate import permutation_test p_value = permutation_test(treatment, control, method='approximate', num_rounds=10000, seed=0) print(p_value) 0.0066 Since p-value < alpha, we can reject the null hypothesis that the two samples come from the same distribution.","title":"Example 1 -- Two-sided permutation test"},{"location":"user_guide/evaluate/permutation_test/#example-2-calculating-the-p-value-for-correlation-analysis-pearsons-r","text":"Note: this is a one-sided hypothesis testing as we conduct the permutation test as \"how many times obtain a correlation coefficient that is greater than the observed value?\" import numpy as np from mlxtend.evaluate import permutation_test x = np.array([1, 2, 3, 4, 5, 6]) y = np.array([2, 4, 1, 5, 6, 7]) print('Observed pearson R: %.2f' % np.corrcoef(x, y)[1][0]) p_value = permutation_test(x, y, method='exact', func=lambda x, y: np.corrcoef(x, y)[1][0], seed=0) print('P value: %.2f' % p_value) Observed pearson R: 0.81 P value: 0.09","title":"Example 2 -- Calculating the p-value for correlation analysis (Pearson's R)"},{"location":"user_guide/evaluate/permutation_test/#api","text":"permutation_test(x, y, func='x_mean != y_mean', method='exact', num_rounds=1000, seed=None) Nonparametric permutation test Parameters x : list or numpy array with shape (n_datapoints,) A list or 1D numpy array of the first sample (e.g., the treatment group). y : list or numpy array with shape (n_datapoints,) A list or 1D numpy array of the second sample (e.g., the control group). func : custom function or str (default: 'x_mean != y_mean') function to compute the statistic for the permutation test. - If 'x_mean != y_mean', uses func=lambda x, y: np.abs(np.mean(x) - np.mean(y))) for a two-sided test. - If 'x_mean > y_mean', uses func=lambda x, y: np.mean(x) - np.mean(y)) for a one-sided test. - If 'x_mean < y_mean', uses func=lambda x, y: np.mean(y) - np.mean(x)) for a one-sided test. method : 'approximate' or 'exact' (default: 'exact') If 'exact' (default), all possible permutations are considered. If 'approximate' the number of drawn samples is given by num_rounds . Note that 'exact' is typically not feasible unless the dataset size is relatively small. num_rounds : int (default: 1000) The number of permutation samples if method='approximate' . seed : int or None (default: None) The random seed for generating permutation samples if method='approximate' . Returns p-value under the null hypothesis Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/permutation_test/","title":"API"},{"location":"user_guide/evaluate/proportion_difference/","text":"Proportion Difference Test Test of the difference of proportions for classifier performance comparison. from mlxtend.evaluate import proportion_difference Overview There are several different statistical hypothesis testing frameworks that are being used in practice to compare the performance of classification models, including common methods such as difference of two proportions (here, the proportions are the estimated generalization accuracies from a test set), for which we can construct 95% confidence intervals based on the concept of the Normal Approximation to the Binomial that was covered in Part I. Performing a z-score test for two population proportions is inarguably the most straight-forward way to compare to models (but certainly not the best!): In a nutshell, if the 95% confidence intervals of the accuracies of two models do not overlap, we can reject the null hypothesis that the performance of both classifiers is equal at a confidence level of \\alpha=0.05 (or 5% probability). Violations of assumptions aside (for instance that the test set samples are not independent), as Thomas Dietterich noted based on empircal results in a simulated study [1], this test tends to have a high false positive rate (here: incorrectly detecting difference when there is none), which is among the reasons why it is not recommended in practice. Nonetheless, for the sake of completeness, and since it a commonly used method in practice, the general procedure is outlined below as follows (which also generally applies to the different hypothesis tests presented later): formulate the hypothesis to be tested (for instance, the null hypothesis stating that the proportions are the same; consequently, the alternative hypothesis that the proportions are different, if we use a two-tailed test); decide upon a significance threshold (for instance, if the probability of observing a difference more extreme than the one observed is more than 5%, then we plan to reject the null hypothesis); analyze the data, compute the test statistic (here: z-score), and compare its associated p-value (probability) to the previously determined significance threshold; based on the p-value and significance threshold, either accept or reject the null hypothesis at the given confidence level and interpret the results. The z-score is computed as the observed difference divided by the square root for their combined variances z = \\frac{ACC_1 - ACC_2}{\\sqrt{\\sigma_{1}^2 + \\sigma_{2}^2}}, where ACC_1 is the accuracy of one model and ACC_2 is the accuracy of a second model estimated from the test set. Recall that we computed the variance of the estimated of the estimated accuracy as \\sigma^2 = \\frac{ACC(1-ACC)}{n} in Part I and then computed the confidence interval (Normal Approximation Interval) as ACC \\pm z \\times \\sigma, where z=1.96 for a 95% confidence interval. Comparing the confidence intervals of two accuracy estimates and checking whether they overlap is then analogous to computing the z value for the difference in proportions and comparing the probability (p-value) to the chosen significance threshold. So, to compute the z-score directly for the difference of two proportions, ACC_1 and ACC_2 , we pool these proportions (assuming that ACC_1 and ACC_2 are the performances of two models estimated on two indendent test sets of size n_1 and n_2 , respectively), ACC_{1, 2} = \\frac{ACC_1 \\times n_1 + ACC_2 \\times n_2}{n_1 + n_2}, and compute the standard deviation as \\sigma_{1,2} = \\sqrt{\\frac{ACC_{1, 2} (1 - ACC_{1, 2})}{n_1 + n_2}}, such that we can compute the z-score, z = \\frac{ACC_1 - ACC_2}{\\sigma_{1,2}}. Since, due to using the same test set (and violating the independence assumption) we have n_1 = n_2 = n , so that we can simplify the z-score computation to z = \\frac{ACC_1 - ACC_2}{\\sqrt{2\\sigma^2}} = \\frac{ACC_1 - ACC_2}{\\sqrt{2\\cdot ACC_{1,2}(1-ACC_{1,2}))/n}}. where ACC_{1, 2} is simply (ACC_1 + ACC_2)/2 . In the second step, based on the computed z value (this assumes the the test errors are independent, which is usually violated in practice as we use the same test set) we can reject the null hypothesis that the a pair of models has equal performance (here, measured in \"classification aaccuracy\") at an \\alpha=0.05 level if z is greater than 1.96. Or if we want to put in the extra work, we can compute the area under the a standard normal cumulative distribution at the z-score threshold. If we find this p-value is smaller than a significance level we set prior to conducting the test, then we can reject the null hypothesis at that given significance level. The problem with this test though is that we use the same test set to compute the accuracy of the two classifiers; thus, it might be better to use a paired test such as a paired sample t-test, but a more robust alternative is the McNemar test. References [1] Dietterich, Thomas G. \"Approximate statistical tests for comparing supervised classification learning algorithms.\" Neural computation 10, no. 7 (1998): 1895-1923. Example 1 - Difference of Proportions As an example for applying this test, consider the following 2 model predictions: import numpy as np ## Dataset: # ground truth labels of the test dataset: y_true = np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) # predictions by 3 classifiers (`y_model_1`, `y_model_2`, and `y_model_3`): y_model_1 = np.array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) y_model_2 = np.array([1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) Assume, the test accuracies are as follows: acc_1 = np.sum(y_true == y_model_1) / y_true.shape[0] acc_2 = np.sum(y_true == y_model_2) / y_true.shape[0] print('Accuracy Model 1:', acc_1) print('Accuracy Model 2:', acc_2) Accuracy Model 1: 0.84 Accuracy Model 2: 0.92 Now, setting a significance threshold of \\alpha=0.05 and conducting the test from mlxtend.evaluate import proportion_difference z, p_value = proportion_difference(acc_1, acc_2, n_1=y_true.shape[0]) print('z: %.3f' % z) print('p-value: %.3f' % p_value) z: -1.754 p-value: 0.040 we find that there is a statistically significant difference between the model performances. It should be highlighted though that using this test, due to the typical independence violation of using the same test set as well as its high false positive rate, it is not recommended to use this test in practice. API proportion_difference(proportion_1, proportion_2, n_1, n_2=None) Computes the test statistic and p-value for a difference of proportions test. Parameters proportion_1 : float The first proportion proportion_2 : float The second proportion n_1 : int The sample size of the first test sample n_2 : int or None (default=None) The sample size of the second test sample. If None , n_1 = n_2 . Returns z, p : float or None, float Returns the z-score and the p-value Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/proportion_difference/","title":"Proportion Difference Test"},{"location":"user_guide/evaluate/proportion_difference/#proportion-difference-test","text":"Test of the difference of proportions for classifier performance comparison. from mlxtend.evaluate import proportion_difference","title":"Proportion Difference Test"},{"location":"user_guide/evaluate/proportion_difference/#overview","text":"There are several different statistical hypothesis testing frameworks that are being used in practice to compare the performance of classification models, including common methods such as difference of two proportions (here, the proportions are the estimated generalization accuracies from a test set), for which we can construct 95% confidence intervals based on the concept of the Normal Approximation to the Binomial that was covered in Part I. Performing a z-score test for two population proportions is inarguably the most straight-forward way to compare to models (but certainly not the best!): In a nutshell, if the 95% confidence intervals of the accuracies of two models do not overlap, we can reject the null hypothesis that the performance of both classifiers is equal at a confidence level of \\alpha=0.05 (or 5% probability). Violations of assumptions aside (for instance that the test set samples are not independent), as Thomas Dietterich noted based on empircal results in a simulated study [1], this test tends to have a high false positive rate (here: incorrectly detecting difference when there is none), which is among the reasons why it is not recommended in practice. Nonetheless, for the sake of completeness, and since it a commonly used method in practice, the general procedure is outlined below as follows (which also generally applies to the different hypothesis tests presented later): formulate the hypothesis to be tested (for instance, the null hypothesis stating that the proportions are the same; consequently, the alternative hypothesis that the proportions are different, if we use a two-tailed test); decide upon a significance threshold (for instance, if the probability of observing a difference more extreme than the one observed is more than 5%, then we plan to reject the null hypothesis); analyze the data, compute the test statistic (here: z-score), and compare its associated p-value (probability) to the previously determined significance threshold; based on the p-value and significance threshold, either accept or reject the null hypothesis at the given confidence level and interpret the results. The z-score is computed as the observed difference divided by the square root for their combined variances z = \\frac{ACC_1 - ACC_2}{\\sqrt{\\sigma_{1}^2 + \\sigma_{2}^2}}, where ACC_1 is the accuracy of one model and ACC_2 is the accuracy of a second model estimated from the test set. Recall that we computed the variance of the estimated of the estimated accuracy as \\sigma^2 = \\frac{ACC(1-ACC)}{n} in Part I and then computed the confidence interval (Normal Approximation Interval) as ACC \\pm z \\times \\sigma, where z=1.96 for a 95% confidence interval. Comparing the confidence intervals of two accuracy estimates and checking whether they overlap is then analogous to computing the z value for the difference in proportions and comparing the probability (p-value) to the chosen significance threshold. So, to compute the z-score directly for the difference of two proportions, ACC_1 and ACC_2 , we pool these proportions (assuming that ACC_1 and ACC_2 are the performances of two models estimated on two indendent test sets of size n_1 and n_2 , respectively), ACC_{1, 2} = \\frac{ACC_1 \\times n_1 + ACC_2 \\times n_2}{n_1 + n_2}, and compute the standard deviation as \\sigma_{1,2} = \\sqrt{\\frac{ACC_{1, 2} (1 - ACC_{1, 2})}{n_1 + n_2}}, such that we can compute the z-score, z = \\frac{ACC_1 - ACC_2}{\\sigma_{1,2}}. Since, due to using the same test set (and violating the independence assumption) we have n_1 = n_2 = n , so that we can simplify the z-score computation to z = \\frac{ACC_1 - ACC_2}{\\sqrt{2\\sigma^2}} = \\frac{ACC_1 - ACC_2}{\\sqrt{2\\cdot ACC_{1,2}(1-ACC_{1,2}))/n}}. where ACC_{1, 2} is simply (ACC_1 + ACC_2)/2 . In the second step, based on the computed z value (this assumes the the test errors are independent, which is usually violated in practice as we use the same test set) we can reject the null hypothesis that the a pair of models has equal performance (here, measured in \"classification aaccuracy\") at an \\alpha=0.05 level if z is greater than 1.96. Or if we want to put in the extra work, we can compute the area under the a standard normal cumulative distribution at the z-score threshold. If we find this p-value is smaller than a significance level we set prior to conducting the test, then we can reject the null hypothesis at that given significance level. The problem with this test though is that we use the same test set to compute the accuracy of the two classifiers; thus, it might be better to use a paired test such as a paired sample t-test, but a more robust alternative is the McNemar test.","title":"Overview"},{"location":"user_guide/evaluate/proportion_difference/#references","text":"[1] Dietterich, Thomas G. \"Approximate statistical tests for comparing supervised classification learning algorithms.\" Neural computation 10, no. 7 (1998): 1895-1923.","title":"References"},{"location":"user_guide/evaluate/proportion_difference/#example-1-difference-of-proportions","text":"As an example for applying this test, consider the following 2 model predictions: import numpy as np ## Dataset: # ground truth labels of the test dataset: y_true = np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) # predictions by 3 classifiers (`y_model_1`, `y_model_2`, and `y_model_3`): y_model_1 = np.array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) y_model_2 = np.array([1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) Assume, the test accuracies are as follows: acc_1 = np.sum(y_true == y_model_1) / y_true.shape[0] acc_2 = np.sum(y_true == y_model_2) / y_true.shape[0] print('Accuracy Model 1:', acc_1) print('Accuracy Model 2:', acc_2) Accuracy Model 1: 0.84 Accuracy Model 2: 0.92 Now, setting a significance threshold of \\alpha=0.05 and conducting the test from mlxtend.evaluate import proportion_difference z, p_value = proportion_difference(acc_1, acc_2, n_1=y_true.shape[0]) print('z: %.3f' % z) print('p-value: %.3f' % p_value) z: -1.754 p-value: 0.040 we find that there is a statistically significant difference between the model performances. It should be highlighted though that using this test, due to the typical independence violation of using the same test set as well as its high false positive rate, it is not recommended to use this test in practice.","title":"Example 1 - Difference of Proportions"},{"location":"user_guide/evaluate/proportion_difference/#api","text":"proportion_difference(proportion_1, proportion_2, n_1, n_2=None) Computes the test statistic and p-value for a difference of proportions test. Parameters proportion_1 : float The first proportion proportion_2 : float The second proportion n_1 : int The sample size of the first test sample n_2 : int or None (default=None) The sample size of the second test sample. If None , n_1 = n_2 . Returns z, p : float or None, float Returns the z-score and the p-value Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/proportion_difference/","title":"API"},{"location":"user_guide/evaluate/scoring/","text":"Scoring A function for computing various different performance metrics. from mlxtend.evaluate import scoring Overview Confusion Matrix The confusion matrix (or error matrix ) is one way to summarize the performance of a classifier for binary classification tasks. This square matrix consists of columns and rows that list the number of instances as absolute or relative \"actual class\" vs. \"predicted class\" ratios. Let P be the label of class 1 and N be the label of a second class or the label of all classes that are not class 1 in a multi-class setting. Error and Accuracy Both the prediction error (ERR) and accuracy (ACC) provide general information about how many samples are misclassified. The error can be understood as the sum of all false predictions divided by the number of total predications, and the the accuracy is calculated as the sum of correct predictions divided by the total number of predictions, respectively. ERR = \\frac{FP + FN}{FP+ FN + TP + TN} = 1-ACC ACC = \\frac{TP + TN}{FP+ FN + TP + TN} = 1-ERR True and False Positive Rates The True Positive Rate (TPR) and False Positive Rate (FPR) are performance metrics that are especially useful for imbalanced class problems. In spam classification , for example, we are of course primarily interested in the detection and filtering out of spam . However, it is also important to decrease the number of messages that were incorrectly classified as spam ( False Positives ): A situation where a person misses an important message is considered as \"worse\" than a situation where a person ends up with a few spam messages in his e-mail inbox. In contrast to the FPR , the True Positive Rate provides useful information about the fraction of positive (or relevant ) samples that were correctly identified out of the total pool of Positives . FPR = \\frac{FP}{N} = \\frac{FP}{FP + TN} TPR = \\frac{TP}{P} = \\frac{TP}{FN + TP} Precision, Recall, and the F1-Score Precision (PRE) and Recall (REC) are metrics that are more commonly used in Information Technology and related to the False and True Prositive Rates . In fact, Recall is synonymous to the True Positive Rate and also sometimes called Sensitivity . The F _1 -Score can be understood as a combination of both Precision and Recall . PRE = \\frac{TP}{TP + FP} REC = TPR = \\frac{TP}{P} = \\frac{TP}{FN + TP} F_1 = 2 \\cdot \\frac{PRE \\cdot REC}{PRE + REC} Sensitivity and Specificity Sensitivity (SEN) is synonymous to Recall and the True Positive Rate whereas Specificity (SPC) is synonymous to the True Negative Rate -- Sensitivity measures the recovery rate of the Positives and complimentary, the Specificity measures the recovery rate of the Negatives . SEN = TPR = REC = \\frac{TP}{P} = \\frac{TP}{FN + TP} SPC = TNR =\\frac{TN}{N} = \\frac{TN}{FP + TN} Matthews Correlation Coefficient Matthews correlation coefficient (MCC) was first formulated by Brian W. Matthews [3] in 1975 to assess the performance of protein secondary structure predictions. The MCC can be understood as a specific case of a linear correlation coefficient ( Pearson's R ) for a binary classification setting and is considered as especially useful in unbalanced class settings. The previous metrics take values in the range between 0 (worst) and 1 (best), whereas the MCC is bounded between the range 1 (perfect correlation between ground truth and predicted outcome) and -1 (inverse or negative correlation) -- a value of 0 denotes a random prediction. MCC = \\frac{ TP \\times TN - FP \\times FN } {\\sqrt{ (TP + FP) ( TP + FN ) ( TN + FP ) ( TN + FN ) } } Average Per-Class Accuracy The \"overall\" accuracy is defined as the number of correct predictions ( true positives TP and true negatives TN) over all samples n : ACC = \\frac{TP + TN}{n} in a binary class setting: In a multi-class setting, we can generalize the computation of the accuracy as the fraction of all true predictions (the diagonal) over all samples n. ACC = \\frac{T}{n} Considering a multi-class problem with 3 classes (C0, C1, C2) let's assume our model made the following predictions: We compute the accuracy as: ACC = \\frac{3 + 50 + 18}{90} \\approx 0.79 Now, in order to compute the average per-class accuracy , we compute the binary accuracy for each class label separately; i.e., if class 1 is the positive class, class 0 and 2 are both considered the negative class. APC\\;ACC = \\frac{83/90 + 71/90 + 78/90}{3} \\approx 0.86 References [1] S. Raschka. An overview of general performance metrics of binary classifier systems . Computing Research Repository (CoRR), abs/1410.5330, 2014. [2] Cyril Goutte and Eric Gaussier. A probabilistic interpretation of precision, recall and f-score, with implication for evaluation . In Advances in Information Retrieval, pages 345\u2013359. Springer, 2005. [3] Brian W Matthews. Comparison of the predicted and observed secondary structure of T4 phage lysozyme . Biochimica et Biophysica Acta (BBA)- Protein Structure, 405(2):442\u2013451, 1975. Example 1 - Classification Error from mlxtend.evaluate import scoring y_targ = [1, 1, 1, 0, 0, 2, 0, 3] y_pred = [1, 0, 1, 0, 0, 2, 1, 3] res = scoring(y_target=y_targ, y_predicted=y_pred, metric='error') print('Error: %s%%' % (res * 100)) Error: 25.0% API scoring(y_target, y_predicted, metric='error', positive_label=1, unique_labels='auto') Compute a scoring metric for supervised learning. Parameters y_target : array-like, shape=[n_values] True class labels or target values. y_predicted : array-like, shape=[n_values] Predicted class labels or target values. metric : str (default: 'error') Performance metric: 'accuracy': (TP + TN)/(FP + FN + TP + TN) = 1-ERR 'per-class accuracy': Average per-class accuracy 'per-class error': Average per-class error 'error': (TP + TN)/(FP+ FN + TP + TN) = 1-ACC 'false_positive_rate': FP/N = FP/(FP + TN) 'true_positive_rate': TP/P = TP/(FN + TP) 'true_negative_rate': TN/N = TN/(FP + TN) 'precision': TP/(TP + FP) 'recall': equal to 'true_positive_rate' 'sensitivity': equal to 'true_positive_rate' or 'recall' 'specificity': equal to 'true_negative_rate' 'f1': 2 * (PRE * REC)/(PRE + REC) 'matthews_corr_coef': (TP TN - FP FN) / (sqrt{(TP + FP)( TP + FN )( TN + FP )( TN + FN )}) Where: [TP: True positives, TN = True negatives, TN: True negatives, FN = False negatives] positive_label : int (default: 1) Label of the positive class for binary classification metrics. unique_labels : str or array-like (default: 'auto') If 'auto', deduces the unique class labels from y_target Returns score : float Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/scoring/","title":"Scoring"},{"location":"user_guide/evaluate/scoring/#scoring","text":"A function for computing various different performance metrics. from mlxtend.evaluate import scoring","title":"Scoring"},{"location":"user_guide/evaluate/scoring/#overview","text":"","title":"Overview"},{"location":"user_guide/evaluate/scoring/#confusion-matrix","text":"The confusion matrix (or error matrix ) is one way to summarize the performance of a classifier for binary classification tasks. This square matrix consists of columns and rows that list the number of instances as absolute or relative \"actual class\" vs. \"predicted class\" ratios. Let P be the label of class 1 and N be the label of a second class or the label of all classes that are not class 1 in a multi-class setting.","title":"Confusion Matrix"},{"location":"user_guide/evaluate/scoring/#error-and-accuracy","text":"Both the prediction error (ERR) and accuracy (ACC) provide general information about how many samples are misclassified. The error can be understood as the sum of all false predictions divided by the number of total predications, and the the accuracy is calculated as the sum of correct predictions divided by the total number of predictions, respectively. ERR = \\frac{FP + FN}{FP+ FN + TP + TN} = 1-ACC ACC = \\frac{TP + TN}{FP+ FN + TP + TN} = 1-ERR","title":"Error and Accuracy"},{"location":"user_guide/evaluate/scoring/#true-and-false-positive-rates","text":"The True Positive Rate (TPR) and False Positive Rate (FPR) are performance metrics that are especially useful for imbalanced class problems. In spam classification , for example, we are of course primarily interested in the detection and filtering out of spam . However, it is also important to decrease the number of messages that were incorrectly classified as spam ( False Positives ): A situation where a person misses an important message is considered as \"worse\" than a situation where a person ends up with a few spam messages in his e-mail inbox. In contrast to the FPR , the True Positive Rate provides useful information about the fraction of positive (or relevant ) samples that were correctly identified out of the total pool of Positives . FPR = \\frac{FP}{N} = \\frac{FP}{FP + TN} TPR = \\frac{TP}{P} = \\frac{TP}{FN + TP}","title":"True and False Positive Rates"},{"location":"user_guide/evaluate/scoring/#precision-recall-and-the-f1-score","text":"Precision (PRE) and Recall (REC) are metrics that are more commonly used in Information Technology and related to the False and True Prositive Rates . In fact, Recall is synonymous to the True Positive Rate and also sometimes called Sensitivity . The F _1 -Score can be understood as a combination of both Precision and Recall . PRE = \\frac{TP}{TP + FP} REC = TPR = \\frac{TP}{P} = \\frac{TP}{FN + TP} F_1 = 2 \\cdot \\frac{PRE \\cdot REC}{PRE + REC}","title":"Precision, Recall, and the F1-Score"},{"location":"user_guide/evaluate/scoring/#sensitivity-and-specificity","text":"Sensitivity (SEN) is synonymous to Recall and the True Positive Rate whereas Specificity (SPC) is synonymous to the True Negative Rate -- Sensitivity measures the recovery rate of the Positives and complimentary, the Specificity measures the recovery rate of the Negatives . SEN = TPR = REC = \\frac{TP}{P} = \\frac{TP}{FN + TP} SPC = TNR =\\frac{TN}{N} = \\frac{TN}{FP + TN}","title":"Sensitivity and Specificity"},{"location":"user_guide/evaluate/scoring/#matthews-correlation-coefficient","text":"Matthews correlation coefficient (MCC) was first formulated by Brian W. Matthews [3] in 1975 to assess the performance of protein secondary structure predictions. The MCC can be understood as a specific case of a linear correlation coefficient ( Pearson's R ) for a binary classification setting and is considered as especially useful in unbalanced class settings. The previous metrics take values in the range between 0 (worst) and 1 (best), whereas the MCC is bounded between the range 1 (perfect correlation between ground truth and predicted outcome) and -1 (inverse or negative correlation) -- a value of 0 denotes a random prediction. MCC = \\frac{ TP \\times TN - FP \\times FN } {\\sqrt{ (TP + FP) ( TP + FN ) ( TN + FP ) ( TN + FN ) } }","title":"Matthews Correlation Coefficient"},{"location":"user_guide/evaluate/scoring/#average-per-class-accuracy","text":"The \"overall\" accuracy is defined as the number of correct predictions ( true positives TP and true negatives TN) over all samples n : ACC = \\frac{TP + TN}{n} in a binary class setting: In a multi-class setting, we can generalize the computation of the accuracy as the fraction of all true predictions (the diagonal) over all samples n. ACC = \\frac{T}{n} Considering a multi-class problem with 3 classes (C0, C1, C2) let's assume our model made the following predictions: We compute the accuracy as: ACC = \\frac{3 + 50 + 18}{90} \\approx 0.79 Now, in order to compute the average per-class accuracy , we compute the binary accuracy for each class label separately; i.e., if class 1 is the positive class, class 0 and 2 are both considered the negative class. APC\\;ACC = \\frac{83/90 + 71/90 + 78/90}{3} \\approx 0.86","title":"Average Per-Class Accuracy"},{"location":"user_guide/evaluate/scoring/#references","text":"[1] S. Raschka. An overview of general performance metrics of binary classifier systems . Computing Research Repository (CoRR), abs/1410.5330, 2014. [2] Cyril Goutte and Eric Gaussier. A probabilistic interpretation of precision, recall and f-score, with implication for evaluation . In Advances in Information Retrieval, pages 345\u2013359. Springer, 2005. [3] Brian W Matthews. Comparison of the predicted and observed secondary structure of T4 phage lysozyme . Biochimica et Biophysica Acta (BBA)- Protein Structure, 405(2):442\u2013451, 1975.","title":"References"},{"location":"user_guide/evaluate/scoring/#example-1-classification-error","text":"from mlxtend.evaluate import scoring y_targ = [1, 1, 1, 0, 0, 2, 0, 3] y_pred = [1, 0, 1, 0, 0, 2, 1, 3] res = scoring(y_target=y_targ, y_predicted=y_pred, metric='error') print('Error: %s%%' % (res * 100)) Error: 25.0%","title":"Example 1 - Classification Error"},{"location":"user_guide/evaluate/scoring/#api","text":"scoring(y_target, y_predicted, metric='error', positive_label=1, unique_labels='auto') Compute a scoring metric for supervised learning. Parameters y_target : array-like, shape=[n_values] True class labels or target values. y_predicted : array-like, shape=[n_values] Predicted class labels or target values. metric : str (default: 'error') Performance metric: 'accuracy': (TP + TN)/(FP + FN + TP + TN) = 1-ERR 'per-class accuracy': Average per-class accuracy 'per-class error': Average per-class error 'error': (TP + TN)/(FP+ FN + TP + TN) = 1-ACC 'false_positive_rate': FP/N = FP/(FP + TN) 'true_positive_rate': TP/P = TP/(FN + TP) 'true_negative_rate': TN/N = TN/(FP + TN) 'precision': TP/(TP + FP) 'recall': equal to 'true_positive_rate' 'sensitivity': equal to 'true_positive_rate' or 'recall' 'specificity': equal to 'true_negative_rate' 'f1': 2 * (PRE * REC)/(PRE + REC) 'matthews_corr_coef': (TP TN - FP FN) / (sqrt{(TP + FP)( TP + FN )( TN + FP )( TN + FN )}) Where: [TP: True positives, TN = True negatives, TN: True negatives, FN = False negatives] positive_label : int (default: 1) Label of the positive class for binary classification metrics. unique_labels : str or array-like (default: 'auto') If 'auto', deduces the unique class labels from y_target Returns score : float Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/evaluate/scoring/","title":"API"},{"location":"user_guide/feature_extraction/LinearDiscriminantAnalysis/","text":"Linear Discriminant Analysis Implementation of Linear Discriminant Analysis for dimensionality reduction from mlxtend.feature_extraction import LinearDiscriminantAnalysis Overview Linear Discriminant Analysis (LDA) is most commonly used as dimensionality reduction technique in the pre-processing step for pattern-classification and machine learning applications. The goal is to project a dataset onto a lower-dimensional space with good class-separability in order avoid overfitting (\"curse of dimensionality\") and also reduce computational costs. Ronald A. Fisher formulated the Linear Discriminant in 1936 ( The Use of Multiple Measurements in Taxonomic Problems ), and it also has some practical uses as classifier. The original Linear discriminant was described for a 2-class problem, and it was then later generalized as \"multi-class Linear Discriminant Analysis\" or \"Multiple Discriminant Analysis\" by C. R. Rao in 1948 ( The utilization of multiple measurements in problems of biological classification ) The general LDA approach is very similar to a Principal Component Analysis, but in addition to finding the component axes that maximize the variance of our data (PCA), we are additionally interested in the axes that maximize the separation between multiple classes (LDA). So, in a nutshell, often the goal of an LDA is to project a feature space (a dataset n-dimensional samples) onto a smaller subspace k (where k \\leq n-1 ) while maintaining the class-discriminatory information. In general, dimensionality reduction does not only help reducing computational costs for a given classification task, but it can also be helpful to avoid overfitting by minimizing the error in parameter estimation (\"curse of dimensionality\"). Summarizing the LDA approach in 5 steps Listed below are the 5 general steps for performing a linear discriminant analysis. Compute the d -dimensional mean vectors for the different classes from the dataset. Compute the scatter matrices (in-between-class and within-class scatter matrix). Compute the eigenvectors ( \\mathbf{e_1}, \\; \\mathbf{e_2}, \\; ..., \\; \\mathbf{e_d} ) and corresponding eigenvalues ( \\mathbf{\\lambda_1}, \\; \\mathbf{\\lambda_2}, \\; ..., \\; \\mathbf{\\lambda_d} ) for the scatter matrices. Sort the eigenvectors by decreasing eigenvalues and choose k eigenvectors with the largest eigenvalues to form a k \\times d dimensional matrix \\mathbf{W} (where every column represents an eigenvector). Use this k \\times d eigenvector matrix to transform the samples onto the new subspace. This can be summarized by the mathematical equation: \\mathbf{Y} = \\mathbf{X} \\times \\mathbf{W} (where \\mathbf{X} is a n \\times d -dimensional matrix representing the n samples, and \\mathbf{y} are the transformed n \\times k -dimensional samples in the new subspace). References Fisher, Ronald A. \" The use of multiple measurements in taxonomic problems. \" Annals of eugenics 7.2 (1936): 179-188. Rao, C. Radhakrishna. \" The utilization of multiple measurements in problems of biological classification. \" Journal of the Royal Statistical Society. Series B (Methodological) 10.2 (1948): 159-203. Example 1 - LDA on Iris from mlxtend.data import iris_data from mlxtend.preprocessing import standardize from mlxtend.feature_extraction import LinearDiscriminantAnalysis X, y = iris_data() X = standardize(X) lda = LinearDiscriminantAnalysis(n_discriminants=2) lda.fit(X, y) X_lda = lda.transform(X) import matplotlib.pyplot as plt with plt.style.context('seaborn-whitegrid'): plt.figure(figsize=(6, 4)) for lab, col in zip((0, 1, 2), ('blue', 'red', 'green')): plt.scatter(X_lda[y == lab, 0], X_lda[y == lab, 1], label=lab, c=col) plt.xlabel('Linear Discriminant 1') plt.ylabel('Linear Discriminant 2') plt.legend(loc='lower right') plt.tight_layout() plt.show() Example 2 - Plotting the Between-Class Variance Explained Ratio from mlxtend.data import iris_data from mlxtend.preprocessing import standardize from mlxtend.feature_extraction import LinearDiscriminantAnalysis X, y = iris_data() X = standardize(X) lda = LinearDiscriminantAnalysis(n_discriminants=None) lda.fit(X, y) X_lda = lda.transform(X) import numpy as np tot = sum(lda.e_vals_) var_exp = [(i / tot)*100 for i in sorted(lda.e_vals_, reverse=True)] cum_var_exp = np.cumsum(var_exp) with plt.style.context('seaborn-whitegrid'): fig, ax = plt.subplots(figsize=(6, 4)) plt.bar(range(4), var_exp, alpha=0.5, align='center', label='individual explained variance') plt.step(range(4), cum_var_exp, where='mid', label='cumulative explained variance') plt.ylabel('Explained variance ratio') plt.xlabel('Principal components') plt.xticks(range(4)) ax.set_xticklabels(np.arange(1, X.shape[1] + 1)) plt.legend(loc='best') plt.tight_layout() API LinearDiscriminantAnalysis(n_discriminants=None) Linear Discriminant Analysis Class Parameters n_discriminants : int (default: None) The number of discrimants for transformation. Keeps the original dimensions of the dataset if None . Attributes w_ : array-like, shape=[n_features, n_discriminants] Projection matrix e_vals_ : array-like, shape=[n_features] Eigenvalues in sorted order. e_vecs_ : array-like, shape=[n_features] Eigenvectors in sorted order. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/feature_extraction/LinearDiscriminantAnalysis/ Methods fit(X, y, n_classes=None) Fit the LDA model with X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. n_classes : int (default: None) A positive integer to declare the number of class labels if not all class labels are present in a partial training set. Gets the number of class labels automatically if None. Returns self : object transform(X) Apply the linear transformation on X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns X_projected : np.ndarray, shape = [n_samples, n_discriminants] Projected training vectors.","title":"Linear Discriminant Analysis"},{"location":"user_guide/feature_extraction/LinearDiscriminantAnalysis/#linear-discriminant-analysis","text":"Implementation of Linear Discriminant Analysis for dimensionality reduction from mlxtend.feature_extraction import LinearDiscriminantAnalysis","title":"Linear Discriminant Analysis"},{"location":"user_guide/feature_extraction/LinearDiscriminantAnalysis/#overview","text":"Linear Discriminant Analysis (LDA) is most commonly used as dimensionality reduction technique in the pre-processing step for pattern-classification and machine learning applications. The goal is to project a dataset onto a lower-dimensional space with good class-separability in order avoid overfitting (\"curse of dimensionality\") and also reduce computational costs. Ronald A. Fisher formulated the Linear Discriminant in 1936 ( The Use of Multiple Measurements in Taxonomic Problems ), and it also has some practical uses as classifier. The original Linear discriminant was described for a 2-class problem, and it was then later generalized as \"multi-class Linear Discriminant Analysis\" or \"Multiple Discriminant Analysis\" by C. R. Rao in 1948 ( The utilization of multiple measurements in problems of biological classification ) The general LDA approach is very similar to a Principal Component Analysis, but in addition to finding the component axes that maximize the variance of our data (PCA), we are additionally interested in the axes that maximize the separation between multiple classes (LDA). So, in a nutshell, often the goal of an LDA is to project a feature space (a dataset n-dimensional samples) onto a smaller subspace k (where k \\leq n-1 ) while maintaining the class-discriminatory information. In general, dimensionality reduction does not only help reducing computational costs for a given classification task, but it can also be helpful to avoid overfitting by minimizing the error in parameter estimation (\"curse of dimensionality\").","title":"Overview"},{"location":"user_guide/feature_extraction/LinearDiscriminantAnalysis/#summarizing-the-lda-approach-in-5-steps","text":"Listed below are the 5 general steps for performing a linear discriminant analysis. Compute the d -dimensional mean vectors for the different classes from the dataset. Compute the scatter matrices (in-between-class and within-class scatter matrix). Compute the eigenvectors ( \\mathbf{e_1}, \\; \\mathbf{e_2}, \\; ..., \\; \\mathbf{e_d} ) and corresponding eigenvalues ( \\mathbf{\\lambda_1}, \\; \\mathbf{\\lambda_2}, \\; ..., \\; \\mathbf{\\lambda_d} ) for the scatter matrices. Sort the eigenvectors by decreasing eigenvalues and choose k eigenvectors with the largest eigenvalues to form a k \\times d dimensional matrix \\mathbf{W} (where every column represents an eigenvector). Use this k \\times d eigenvector matrix to transform the samples onto the new subspace. This can be summarized by the mathematical equation: \\mathbf{Y} = \\mathbf{X} \\times \\mathbf{W} (where \\mathbf{X} is a n \\times d -dimensional matrix representing the n samples, and \\mathbf{y} are the transformed n \\times k -dimensional samples in the new subspace).","title":"Summarizing the LDA approach in 5 steps"},{"location":"user_guide/feature_extraction/LinearDiscriminantAnalysis/#references","text":"Fisher, Ronald A. \" The use of multiple measurements in taxonomic problems. \" Annals of eugenics 7.2 (1936): 179-188. Rao, C. Radhakrishna. \" The utilization of multiple measurements in problems of biological classification. \" Journal of the Royal Statistical Society. Series B (Methodological) 10.2 (1948): 159-203.","title":"References"},{"location":"user_guide/feature_extraction/LinearDiscriminantAnalysis/#example-1-lda-on-iris","text":"from mlxtend.data import iris_data from mlxtend.preprocessing import standardize from mlxtend.feature_extraction import LinearDiscriminantAnalysis X, y = iris_data() X = standardize(X) lda = LinearDiscriminantAnalysis(n_discriminants=2) lda.fit(X, y) X_lda = lda.transform(X) import matplotlib.pyplot as plt with plt.style.context('seaborn-whitegrid'): plt.figure(figsize=(6, 4)) for lab, col in zip((0, 1, 2), ('blue', 'red', 'green')): plt.scatter(X_lda[y == lab, 0], X_lda[y == lab, 1], label=lab, c=col) plt.xlabel('Linear Discriminant 1') plt.ylabel('Linear Discriminant 2') plt.legend(loc='lower right') plt.tight_layout() plt.show()","title":"Example 1 - LDA on Iris"},{"location":"user_guide/feature_extraction/LinearDiscriminantAnalysis/#example-2-plotting-the-between-class-variance-explained-ratio","text":"from mlxtend.data import iris_data from mlxtend.preprocessing import standardize from mlxtend.feature_extraction import LinearDiscriminantAnalysis X, y = iris_data() X = standardize(X) lda = LinearDiscriminantAnalysis(n_discriminants=None) lda.fit(X, y) X_lda = lda.transform(X) import numpy as np tot = sum(lda.e_vals_) var_exp = [(i / tot)*100 for i in sorted(lda.e_vals_, reverse=True)] cum_var_exp = np.cumsum(var_exp) with plt.style.context('seaborn-whitegrid'): fig, ax = plt.subplots(figsize=(6, 4)) plt.bar(range(4), var_exp, alpha=0.5, align='center', label='individual explained variance') plt.step(range(4), cum_var_exp, where='mid', label='cumulative explained variance') plt.ylabel('Explained variance ratio') plt.xlabel('Principal components') plt.xticks(range(4)) ax.set_xticklabels(np.arange(1, X.shape[1] + 1)) plt.legend(loc='best') plt.tight_layout()","title":"Example 2 - Plotting the Between-Class Variance Explained Ratio"},{"location":"user_guide/feature_extraction/LinearDiscriminantAnalysis/#api","text":"LinearDiscriminantAnalysis(n_discriminants=None) Linear Discriminant Analysis Class Parameters n_discriminants : int (default: None) The number of discrimants for transformation. Keeps the original dimensions of the dataset if None . Attributes w_ : array-like, shape=[n_features, n_discriminants] Projection matrix e_vals_ : array-like, shape=[n_features] Eigenvalues in sorted order. e_vecs_ : array-like, shape=[n_features] Eigenvectors in sorted order. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/feature_extraction/LinearDiscriminantAnalysis/","title":"API"},{"location":"user_guide/feature_extraction/LinearDiscriminantAnalysis/#methods","text":"fit(X, y, n_classes=None) Fit the LDA model with X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. n_classes : int (default: None) A positive integer to declare the number of class labels if not all class labels are present in a partial training set. Gets the number of class labels automatically if None. Returns self : object transform(X) Apply the linear transformation on X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns X_projected : np.ndarray, shape = [n_samples, n_discriminants] Projected training vectors.","title":"Methods"},{"location":"user_guide/feature_extraction/PrincipalComponentAnalysis/","text":"Principal Component Analysis Implementation of Principal Component Analysis for dimensionality reduction from mlxtend.feature_extraction import PrincipalComponentAnalysis Overview The sheer size of data in the modern age is not only a challenge for computer hardware but also a main bottleneck for the performance of many machine learning algorithms. The main goal of a PCA analysis is to identify patterns in data; PCA aims to detect the correlation between variables. If a strong correlation between variables exists, the attempt to reduce the dimensionality only makes sense. In a nutshell, this is what PCA is all about: Finding the directions of maximum variance in high-dimensional data and project it onto a smaller dimensional subspace while retaining most of the information. PCA and Dimensionality Reduction Often, the desired goal is to reduce the dimensions of a d -dimensional dataset by projecting it onto a (k) -dimensional subspace (where k\\;<\\;d ) in order to increase the computational efficiency while retaining most of the information. An important question is \"what is the size of k that represents the data 'well'?\" Later, we will compute eigenvectors (the principal components) of a dataset and collect them in a projection matrix. Each of those eigenvectors is associated with an eigenvalue which can be interpreted as the \"length\" or \"magnitude\" of the corresponding eigenvector. If some eigenvalues have a significantly larger magnitude than others that the reduction of the dataset via PCA onto a smaller dimensional subspace by dropping the \"less informative\" eigenpairs is reasonable. A Summary of the PCA Approach Standardize the data. Obtain the Eigenvectors and Eigenvalues from the covariance matrix or correlation matrix, or perform Singular Vector Decomposition. Sort eigenvalues in descending order and choose the k eigenvectors that correspond to the k largest eigenvalues where k is the number of dimensions of the new feature subspace ( k \\le d ). Construct the projection matrix \\mathbf{W} from the selected k eigenvectors. Transform the original dataset \\mathbf{X} via \\mathbf{W} to obtain a k -dimensional feature subspace \\mathbf{Y} . References Pearson, Karl. \"LIII. On lines and planes of closest fit to systems of points in space. \" The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 2.11 (1901): 559-572. Example 1 - PCA on Iris from mlxtend.data import iris_data from mlxtend.preprocessing import standardize from mlxtend.feature_extraction import PrincipalComponentAnalysis X, y = iris_data() X = standardize(X) pca = PrincipalComponentAnalysis(n_components=2) pca.fit(X) X_pca = pca.transform(X) import matplotlib.pyplot as plt with plt.style.context('seaborn-whitegrid'): plt.figure(figsize=(6, 4)) for lab, col in zip((0, 1, 2), ('blue', 'red', 'green')): plt.scatter(X_pca[y==lab, 0], X_pca[y==lab, 1], label=lab, c=col) plt.xlabel('Principal Component 1') plt.ylabel('Principal Component 2') plt.legend(loc='lower center') plt.tight_layout() plt.show() Example 2 - Plotting the Variance Explained Ratio from mlxtend.data import iris_data from mlxtend.preprocessing import standardize X, y = iris_data() X = standardize(X) pca = PrincipalComponentAnalysis(n_components=None) pca.fit(X) X_pca = pca.transform(X) import numpy as np tot = sum(pca.e_vals_) var_exp = [(i / tot)*100 for i in sorted(pca.e_vals_, reverse=True)] cum_var_exp = np.cumsum(var_exp) with plt.style.context('seaborn-whitegrid'): fig, ax = plt.subplots(figsize=(6, 4)) plt.bar(range(4), var_exp, alpha=0.5, align='center', label='individual explained variance') plt.step(range(4), cum_var_exp, where='mid', label='cumulative explained variance') plt.ylabel('Explained variance ratio') plt.xlabel('Principal components') plt.xticks(range(4)) ax.set_xticklabels(np.arange(1, X.shape[1] + 1)) plt.legend(loc='best') plt.tight_layout() Example 3 - PCA via SVD While the eigendecomposition of the covariance or correlation matrix may be more intuitiuve, most PCA implementations perform a Singular Vector Decomposition (SVD) to improve the computational efficiency. Another advantage of using SVD is that the results tend to be more numerically stable, since we can decompose the input matrix directly without the additional covariance-matrix step. from mlxtend.data import iris_data from mlxtend.preprocessing import standardize from mlxtend.feature_extraction import PrincipalComponentAnalysis X, y = iris_data() X = standardize(X) pca = PrincipalComponentAnalysis(n_components=2, solver='svd') pca.fit(X) X_pca = pca.transform(X) import matplotlib.pyplot as plt with plt.style.context('seaborn-whitegrid'): plt.figure(figsize=(6, 4)) for lab, col in zip((0, 1, 2), ('blue', 'red', 'green')): plt.scatter(X_pca[y==lab, 0], X_pca[y==lab, 1], label=lab, c=col) plt.xlabel('Principal Component 1') plt.ylabel('Principal Component 2') plt.legend(loc='lower center') plt.tight_layout() plt.show() If we compare this PCA projection to the previous plot in example 1, we notice that they are mirror images of each other. Note that this is not due to an error in any of those two implementations, but the reason for this difference is that, depending on the eigensolver, eigenvectors can have either negative or positive signs. For instance, if v is an eigenvector of a matrix \\Sigma , we have \\Sigma v = \\lambda v, where \\lambda is our eigenvalue then -v is also an eigenvector that has the same eigenvalue, since \\Sigma(-v) = -\\Sigma v = -\\lambda v = \\lambda(-v). Example 4 - Factor Loadings After evoking the fit method, the factor loadings are available via the loadings_ attribute. In simple terms, the the loadings are the unstandardized values of the eigenvectors. Or in other words, we can interpret the loadings as the covariances (or correlation in case we standardized the input features) between the input features and the and the principal components (or eigenvectors), which have been scaled to unit length. By having the loadings scaled, they become comparable by magnitude and we can assess how much variance in a component is attributed to the input features (as the components are just a weighted linear combination of the input features). from mlxtend.data import iris_data from mlxtend.preprocessing import standardize from mlxtend.feature_extraction import PrincipalComponentAnalysis import matplotlib.pyplot as plt X, y = iris_data() X = standardize(X) pca = PrincipalComponentAnalysis(n_components=2, solver='eigen') pca.fit(X); xlabels = ['sepal length', 'sepal width', 'petal length', 'petal width'] fig, ax = plt.subplots(1, 2, figsize=(8, 3)) ax[0].bar(range(4), pca.loadings_[:, 0], align='center') ax[1].bar(range(4), pca.loadings_[:, 1], align='center') ax[0].set_ylabel('Factor loading onto PC1') ax[1].set_ylabel('Factor loading onto PC2') ax[0].set_xticks(range(4)) ax[1].set_xticks(range(4)) ax[0].set_xticklabels(xlabels, rotation=45) ax[1].set_xticklabels(xlabels, rotation=45) plt.ylim([-1, 1]) plt.tight_layout() For instance, we may say that most of the variance in the first component is attributed to the petal features (although the loading of sepal length on PC1 is also not much less in magnitude). In contrast, the remaining variance captured by PC2 is mostly due to the sepal width. Note that we know from Example 2 that PC1 explains most of the variance, and based on the information from the loading plots, we may say that petal features combined with sepal length may explain most of the spread in the data. API PrincipalComponentAnalysis(n_components=None, solver='eigen') Principal Component Analysis Class Parameters n_components : int (default: None) The number of principal components for transformation. Keeps the original dimensions of the dataset if None . solver : str (default: 'eigen') Method for performing the matrix decomposition. {'eigen', 'svd'} Attributes w_ : array-like, shape=[n_features, n_components] Projection matrix e_vals_ : array-like, shape=[n_features] Eigenvalues in sorted order. e_vecs_ : array-like, shape=[n_features] Eigenvectors in sorted order. loadings_ : array_like, shape=[n_features, n_features] The factor loadings of the original variables onto the principal components. The columns are the principal components, and the rows are the features loadings. For instance, the first column contains the loadings onto the first principal component. Note that the signs may be flipped depending on whether you use the 'eigen' or 'svd' solver; this does not affect the interpretation of the loadings though. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/feature_extraction/PrincipalComponentAnalysis/ Methods fit(X) Learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns self : object transform(X) Apply the linear transformation on X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns X_projected : np.ndarray, shape = [n_samples, n_components] Projected training vectors.","title":"Principal Component Analysis"},{"location":"user_guide/feature_extraction/PrincipalComponentAnalysis/#principal-component-analysis","text":"Implementation of Principal Component Analysis for dimensionality reduction from mlxtend.feature_extraction import PrincipalComponentAnalysis","title":"Principal Component Analysis"},{"location":"user_guide/feature_extraction/PrincipalComponentAnalysis/#overview","text":"The sheer size of data in the modern age is not only a challenge for computer hardware but also a main bottleneck for the performance of many machine learning algorithms. The main goal of a PCA analysis is to identify patterns in data; PCA aims to detect the correlation between variables. If a strong correlation between variables exists, the attempt to reduce the dimensionality only makes sense. In a nutshell, this is what PCA is all about: Finding the directions of maximum variance in high-dimensional data and project it onto a smaller dimensional subspace while retaining most of the information.","title":"Overview"},{"location":"user_guide/feature_extraction/PrincipalComponentAnalysis/#pca-and-dimensionality-reduction","text":"Often, the desired goal is to reduce the dimensions of a d -dimensional dataset by projecting it onto a (k) -dimensional subspace (where k\\;<\\;d ) in order to increase the computational efficiency while retaining most of the information. An important question is \"what is the size of k that represents the data 'well'?\" Later, we will compute eigenvectors (the principal components) of a dataset and collect them in a projection matrix. Each of those eigenvectors is associated with an eigenvalue which can be interpreted as the \"length\" or \"magnitude\" of the corresponding eigenvector. If some eigenvalues have a significantly larger magnitude than others that the reduction of the dataset via PCA onto a smaller dimensional subspace by dropping the \"less informative\" eigenpairs is reasonable.","title":"PCA and Dimensionality Reduction"},{"location":"user_guide/feature_extraction/PrincipalComponentAnalysis/#a-summary-of-the-pca-approach","text":"Standardize the data. Obtain the Eigenvectors and Eigenvalues from the covariance matrix or correlation matrix, or perform Singular Vector Decomposition. Sort eigenvalues in descending order and choose the k eigenvectors that correspond to the k largest eigenvalues where k is the number of dimensions of the new feature subspace ( k \\le d ). Construct the projection matrix \\mathbf{W} from the selected k eigenvectors. Transform the original dataset \\mathbf{X} via \\mathbf{W} to obtain a k -dimensional feature subspace \\mathbf{Y} .","title":"A Summary of the PCA Approach"},{"location":"user_guide/feature_extraction/PrincipalComponentAnalysis/#references","text":"Pearson, Karl. \"LIII. On lines and planes of closest fit to systems of points in space. \" The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 2.11 (1901): 559-572.","title":"References"},{"location":"user_guide/feature_extraction/PrincipalComponentAnalysis/#example-1-pca-on-iris","text":"from mlxtend.data import iris_data from mlxtend.preprocessing import standardize from mlxtend.feature_extraction import PrincipalComponentAnalysis X, y = iris_data() X = standardize(X) pca = PrincipalComponentAnalysis(n_components=2) pca.fit(X) X_pca = pca.transform(X) import matplotlib.pyplot as plt with plt.style.context('seaborn-whitegrid'): plt.figure(figsize=(6, 4)) for lab, col in zip((0, 1, 2), ('blue', 'red', 'green')): plt.scatter(X_pca[y==lab, 0], X_pca[y==lab, 1], label=lab, c=col) plt.xlabel('Principal Component 1') plt.ylabel('Principal Component 2') plt.legend(loc='lower center') plt.tight_layout() plt.show()","title":"Example 1 - PCA on Iris"},{"location":"user_guide/feature_extraction/PrincipalComponentAnalysis/#example-2-plotting-the-variance-explained-ratio","text":"from mlxtend.data import iris_data from mlxtend.preprocessing import standardize X, y = iris_data() X = standardize(X) pca = PrincipalComponentAnalysis(n_components=None) pca.fit(X) X_pca = pca.transform(X) import numpy as np tot = sum(pca.e_vals_) var_exp = [(i / tot)*100 for i in sorted(pca.e_vals_, reverse=True)] cum_var_exp = np.cumsum(var_exp) with plt.style.context('seaborn-whitegrid'): fig, ax = plt.subplots(figsize=(6, 4)) plt.bar(range(4), var_exp, alpha=0.5, align='center', label='individual explained variance') plt.step(range(4), cum_var_exp, where='mid', label='cumulative explained variance') plt.ylabel('Explained variance ratio') plt.xlabel('Principal components') plt.xticks(range(4)) ax.set_xticklabels(np.arange(1, X.shape[1] + 1)) plt.legend(loc='best') plt.tight_layout()","title":"Example 2 - Plotting the Variance Explained Ratio"},{"location":"user_guide/feature_extraction/PrincipalComponentAnalysis/#example-3-pca-via-svd","text":"While the eigendecomposition of the covariance or correlation matrix may be more intuitiuve, most PCA implementations perform a Singular Vector Decomposition (SVD) to improve the computational efficiency. Another advantage of using SVD is that the results tend to be more numerically stable, since we can decompose the input matrix directly without the additional covariance-matrix step. from mlxtend.data import iris_data from mlxtend.preprocessing import standardize from mlxtend.feature_extraction import PrincipalComponentAnalysis X, y = iris_data() X = standardize(X) pca = PrincipalComponentAnalysis(n_components=2, solver='svd') pca.fit(X) X_pca = pca.transform(X) import matplotlib.pyplot as plt with plt.style.context('seaborn-whitegrid'): plt.figure(figsize=(6, 4)) for lab, col in zip((0, 1, 2), ('blue', 'red', 'green')): plt.scatter(X_pca[y==lab, 0], X_pca[y==lab, 1], label=lab, c=col) plt.xlabel('Principal Component 1') plt.ylabel('Principal Component 2') plt.legend(loc='lower center') plt.tight_layout() plt.show() If we compare this PCA projection to the previous plot in example 1, we notice that they are mirror images of each other. Note that this is not due to an error in any of those two implementations, but the reason for this difference is that, depending on the eigensolver, eigenvectors can have either negative or positive signs. For instance, if v is an eigenvector of a matrix \\Sigma , we have \\Sigma v = \\lambda v, where \\lambda is our eigenvalue then -v is also an eigenvector that has the same eigenvalue, since \\Sigma(-v) = -\\Sigma v = -\\lambda v = \\lambda(-v).","title":"Example 3 - PCA via SVD"},{"location":"user_guide/feature_extraction/PrincipalComponentAnalysis/#example-4-factor-loadings","text":"After evoking the fit method, the factor loadings are available via the loadings_ attribute. In simple terms, the the loadings are the unstandardized values of the eigenvectors. Or in other words, we can interpret the loadings as the covariances (or correlation in case we standardized the input features) between the input features and the and the principal components (or eigenvectors), which have been scaled to unit length. By having the loadings scaled, they become comparable by magnitude and we can assess how much variance in a component is attributed to the input features (as the components are just a weighted linear combination of the input features). from mlxtend.data import iris_data from mlxtend.preprocessing import standardize from mlxtend.feature_extraction import PrincipalComponentAnalysis import matplotlib.pyplot as plt X, y = iris_data() X = standardize(X) pca = PrincipalComponentAnalysis(n_components=2, solver='eigen') pca.fit(X); xlabels = ['sepal length', 'sepal width', 'petal length', 'petal width'] fig, ax = plt.subplots(1, 2, figsize=(8, 3)) ax[0].bar(range(4), pca.loadings_[:, 0], align='center') ax[1].bar(range(4), pca.loadings_[:, 1], align='center') ax[0].set_ylabel('Factor loading onto PC1') ax[1].set_ylabel('Factor loading onto PC2') ax[0].set_xticks(range(4)) ax[1].set_xticks(range(4)) ax[0].set_xticklabels(xlabels, rotation=45) ax[1].set_xticklabels(xlabels, rotation=45) plt.ylim([-1, 1]) plt.tight_layout() For instance, we may say that most of the variance in the first component is attributed to the petal features (although the loading of sepal length on PC1 is also not much less in magnitude). In contrast, the remaining variance captured by PC2 is mostly due to the sepal width. Note that we know from Example 2 that PC1 explains most of the variance, and based on the information from the loading plots, we may say that petal features combined with sepal length may explain most of the spread in the data.","title":"Example 4 - Factor Loadings"},{"location":"user_guide/feature_extraction/PrincipalComponentAnalysis/#api","text":"PrincipalComponentAnalysis(n_components=None, solver='eigen') Principal Component Analysis Class Parameters n_components : int (default: None) The number of principal components for transformation. Keeps the original dimensions of the dataset if None . solver : str (default: 'eigen') Method for performing the matrix decomposition. {'eigen', 'svd'} Attributes w_ : array-like, shape=[n_features, n_components] Projection matrix e_vals_ : array-like, shape=[n_features] Eigenvalues in sorted order. e_vecs_ : array-like, shape=[n_features] Eigenvectors in sorted order. loadings_ : array_like, shape=[n_features, n_features] The factor loadings of the original variables onto the principal components. The columns are the principal components, and the rows are the features loadings. For instance, the first column contains the loadings onto the first principal component. Note that the signs may be flipped depending on whether you use the 'eigen' or 'svd' solver; this does not affect the interpretation of the loadings though. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/feature_extraction/PrincipalComponentAnalysis/","title":"API"},{"location":"user_guide/feature_extraction/PrincipalComponentAnalysis/#methods","text":"fit(X) Learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns self : object transform(X) Apply the linear transformation on X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns X_projected : np.ndarray, shape = [n_samples, n_components] Projected training vectors.","title":"Methods"},{"location":"user_guide/feature_extraction/RBFKernelPCA/","text":"RBF Kernel Principal Component Analysis Implementation of RBF Kernel Principal Component Analysis for non-linear dimensionality reduction from mlxtend.feature_extraction import RBFKernelPCA Overview Most machine learning algorithms have been developed and statistically validated for linearly separable data. Popular examples are linear classifiers like Support Vector Machines (SVMs) or the (standard) Principal Component Analysis (PCA) for dimensionality reduction. However, most real world data requires nonlinear methods in order to perform tasks that involve the analysis and discovery of patterns successfully. The focus of this overview is to briefly introduce the idea of kernel methods and to implement a Gaussian radius basis function (RBF) kernel that is used to perform nonlinear dimensionality reduction via BF kernel principal component analysis (kPCA). Principal Component Analysis The main purpose of principal component analysis (PCA) is the analysis of data to identify patterns that represent the data \u201cwell.\u201d The principal components can be understood as new axes of the dataset that maximize the variance along those axes (the eigenvectors of the covariance matrix). In other words, PCA aims to find the axes with maximum variances along which the data is most spread. For more details, please see the related article on mlxtend.feature_extraction.PrincipalComponentAnalysis . Nonlinear dimensionality reduction The \u201cclassic\u201d PCA approach described above is a linear projection technique that works well if the data is linearly separable. However, in the case of linearly inseparable data, a nonlinear technique is required if the task is to reduce the dimensionality of a dataset. Kernel functions and the kernel trick The basic idea to deal with linearly inseparable data is to project it onto a higher dimensional space where it becomes linearly separable. Let us call this nonlinear mapping function \\phi so that the mapping of a sample \\mathbf{x} can be written as \\mathbf{x} \\rightarrow \\phi (\\mathbf{x}) , which is called \"kernel function.\" Now, the term \"kernel\" describes a function that calculates the dot product of the images of the samples \\mathbf{x} under \\phi . \\kappa(\\mathbf{x_i, x_j}) = \\phi (\\mathbf{x_i}) \\phi (\\mathbf{x_j})^T More details about the derivation of this equation are provided in this excellent review article by Quan Wang: Kernel Principal Component Analysis and its Applications in Face Recognition and Active Shape Models .[ 1 ] In other words, the function \\phi maps the original d-dimensional features into a larger, k-dimensional feature space by creating nononlinear combinations of the original features. For example, if \\mathbf{x} consists of 2 features: \\mathbf{x} = \\big[x_1 \\quad x_2\\big]^T \\quad \\quad \\mathbf{x} \\in I\\!R^d \\Downarrow \\phi \\mathbf{x}' = \\big[x_1 \\quad x_2 \\quad x_1 x_2 \\quad x_{1}^2 \\quad x_1 x_{2}^3 \\quad \\dots \\big]^T \\quad \\quad \\mathbf{x} \\in I\\!R^k (k >> d) Often, the mathematical definition of the RBF kernel is written and implemented as \\kappa(\\mathbf{x_i, x_j}) = exp\\bigg(- \\gamma \\; \\lVert\\mathbf{x_i - x_j }\\rVert^{2}_{2} \\bigg) where \\textstyle\\gamma = \\tfrac{1}{2\\sigma^2} is a free parameter that is to be optimized. Gaussian radial basis function (RBF) Kernel PCA In the linear PCA approach, we are interested in the principal components that maximize the variance in the dataset. This is done by extracting the eigenvectors (principle components) that correspond to the largest eigenvalues based on the covariance matrix: \\text{Cov} = \\frac{1}{N} \\sum_{i=1}^{N} \\mathbf{x_i} \\mathbf{x_i}^T Bernhard Scholkopf ( Kernel Principal Component Analysis [ 2 ]) generalized this approach for data that was mapped onto the higher dimensional space via a kernel function: \\text{Cov} = \\frac{1}{N} \\sum_{i=1}^{N} \\phi(\\mathbf{x_i}) \\phi(\\mathbf{x_i})^T However, in practice the the covariance matrix in the higher dimensional space is not calculated explicitly (kernel trick). Therefore, the implementation of RBF kernel PCA does not yield the principal component axes (in contrast to the standard PCA), but the obtained eigenvectors can be understood as projections of the data onto the principal components. RBF kernel PCA step-by-step 1. Computation of the kernel (similarity) matrix. In this first step, we need to calculate \\kappa(\\mathbf{x_i, x_j}) = exp\\bigg(- \\gamma \\; \\lVert\\mathbf{x_i - x_j }\\rVert^{2}_{2} \\bigg) for every pair of points. E.g., if we have a dataset of 100 samples, this step would result in a symmetric 100x100 kernel matrix. 2. Eigendecomposition of the kernel matrix. Since it is not guaranteed that the kernel matrix is centered, we can apply the following equation to do so: K' = K - \\mathbf{1_N} K - K \\mathbf{1_N} + \\mathbf{1_N} K \\mathbf{1_N} where \\mathbf{1_N} is (like the kernel matrix) a N\\times N matrix with all values equal to \\frac{1}{N} . [ 3 ] Now, we have to obtain the eigenvectors of the centered kernel matrix that correspond to the largest eigenvalues. Those eigenvectors are the data points already projected onto the respective principal components. Projecting new data So far, so good, in the sections above, we have been projecting an dataset onto a new feature subspace. However, in a real application, we are usually interested in mapping new data points onto the same new feature subspace (e.g., if are working with a training and a test dataset in pattern classification tasks). Remember, when we computed the eigenvectors \\mathbf{\\alpha} of the centered kernel matrix, those values were actually already the projected datapoints onto the principal component axis \\mathbf{g} . If we want to project a new data point \\mathbf{x} onto this principal component axis, we'd need to compute \\phi(\\mathbf{x})^T \\mathbf{g} . Fortunately, also here, we don't have to compute \\phi(\\mathbf{x})^T \\mathbf{g} explicitely but use the kernel trick to calculate the RBF kernel between the new data point and every data point j in the training dataset: \\phi(\\mathbf{x})^T \\mathbf{g} = \\sum_j \\alpha_{i} \\; \\phi(\\mathbf{x}) \\; \\phi(\\mathbf{x_j})^T = \\sum_j \\alpha_{i} \\; \\kappa(\\mathbf{x}, \\mathbf{x_j}) and the eigenvectors \\alpha and eigenvalues \\lambda of the Kernel matrix \\mathbf{K} satisfy the equation \\mathbf{K} \\alpha = \\lambda \\alpha , we just need to normalize the eigenvector by the corresponding eigenvalue. References [1] Q. Wang. Kernel principal component analysis and its applications in face recognition and active shape models . CoRR, abs/1207.3538, 2012. [2] B. Scholkopf, A. Smola, and K.-R. Muller. Kernel principal component analysis . pages 583\u2013588, 1997. [3] B. Scholkopf, A. Smola, and K.-R. Muller. Nonlinear component analysis as a kernel eigenvalue problem . Neural computation, 10(5):1299\u20131319, 1998. Example 1 - Half-moon shapes We will start with a simple example of 2 half-moon shapes generated by the make_moons function from scikit-learn. import matplotlib.pyplot as plt from sklearn.datasets import make_moons X, y = make_moons(n_samples=50, random_state=1) plt.scatter(X[y==0, 0], X[y==0, 1], color='red', marker='o', alpha=0.5) plt.scatter(X[y==1, 0], X[y==1, 1], color='blue', marker='^', alpha=0.5) plt.ylabel('y coordinate') plt.xlabel('x coordinate') plt.show() Since the two half-moon shapes are linearly inseparable, we expect that the \u201cclassic\u201d PCA will fail to give us a \u201cgood\u201d representation of the data in 1D space. Let us use PCA class to perform the dimensionality reduction. from mlxtend.feature_extraction import PrincipalComponentAnalysis as PCA pca = PCA(n_components=2) X_pca = pca.fit(X).transform(X) plt.scatter(X_pca[y==0, 0], X_pca[y==0, 1], color='red', marker='o', alpha=0.5) plt.scatter(X_pca[y==1, 0], X_pca[y==1, 1], color='blue', marker='^', alpha=0.5) plt.xlabel('PC1') plt.ylabel('PC2') plt.show() As we can see, the resulting principal components do not yield a subspace where the data is linearly separated well. Note that PCA is a unsupervised method and does not \u201cconsider\u201d class labels in order to maximize the variance in contrast to Linear Discriminant Analysis. Here, the colors blue and red are just added for visualization purposes to indicate the degree of separation. Next, we will perform dimensionality reduction via RBF kernel PCA on our half-moon data. The choice of \\gamma depends on the dataset and can be obtained via hyperparameter tuning techniques like Grid Search. Hyperparameter tuning is a broad topic itself, and here I will just use a \\gamma -value that I found to produce \u201cgood\u201d results. from mlxtend.data import iris_data from mlxtend.preprocessing import standardize from mlxtend.feature_extraction import RBFKernelPCA as KPCA kpca = KPCA(gamma=15.0, n_components=2) kpca.fit(X) X_kpca = kpca.X_projected_ Please note that the components of kernel methods such as RBF kernel PCA already represent the projected data points (in contrast to PCA, where the component axis are the \"top k\" eigenvectors thar are used to contruct a projection matrix, which is then used to transform the training samples). Thus, the projected training set is available after fitting via the .X_projected_ attribute. plt.scatter(X_kpca[y==0, 0], X_kpca[y==0, 1], color='red', marker='o', alpha=0.5) plt.scatter(X_kpca[y==1, 0], X_kpca[y==1, 1], color='blue', marker='^', alpha=0.5) plt.title('First 2 principal components after RBF Kernel PCA') plt.xlabel('PC1') plt.ylabel('PC2') plt.show() The new feature space is linearly separable now. Since we are often interested in dimensionality reduction, let's have a look at the first component only. import numpy as np plt.scatter(X_kpca[y==0, 0], np.zeros((25, 1)), color='red', marker='o', alpha=0.5) plt.scatter(X_kpca[y==1, 0], np.zeros((25, 1)), color='blue', marker='^', alpha=0.5) plt.title('First principal component after RBF Kernel PCA') plt.xlabel('PC1') plt.yticks([]) plt.show() We can clearly see that the projection via RBF kernel PCA yielded a subspace where the classes are separated well. Such a subspace can then be used as input for generalized linear classification models, e.g., logistic regression. Projecting new data Finally, via the transform method, we can project new data onto the new component axes. import matplotlib.pyplot as plt from sklearn.datasets import make_moons X2, y2 = make_moons(n_samples=200, random_state=5) X2_kpca = kpca.transform(X2) plt.scatter(X_kpca[y==0, 0], X_kpca[y==0, 1], color='red', marker='o', alpha=0.5, label='fit data') plt.scatter(X_kpca[y==1, 0], X_kpca[y==1, 1], color='blue', marker='^', alpha=0.5, label='fit data') plt.scatter(X2_kpca[y2==0, 0], X2_kpca[y2==0, 1], color='orange', marker='v', alpha=0.2, label='new data') plt.scatter(X2_kpca[y2==1, 0], X2_kpca[y2==1, 1], color='cyan', marker='s', alpha=0.2, label='new data') plt.legend() plt.show() Example 2 - Concentric circles Following the concepts explained in example 1, let's have a look at another classic case: 2 concentric circles with random noise produced by scikit-learn\u2019s make_circles . from sklearn.datasets import make_circles X, y = make_circles(n_samples=1000, random_state=123, noise=0.1, factor=0.2) plt.figure(figsize=(8,6)) plt.scatter(X[y==0, 0], X[y==0, 1], color='red', alpha=0.5) plt.scatter(X[y==1, 0], X[y==1, 1], color='blue', alpha=0.5) plt.title('Concentric circles') plt.ylabel('y coordinate') plt.xlabel('x coordinate') plt.show() from mlxtend.data import iris_data from mlxtend.preprocessing import standardize from mlxtend.feature_extraction import RBFKernelPCA as KPCA kpca = KPCA(gamma=15.0, n_components=2) kpca.fit(X) X_kpca = kpca.X_projected_ plt.scatter(X_kpca[y==0, 0], X_kpca[y==0, 1], color='red', marker='o', alpha=0.5) plt.scatter(X_kpca[y==1, 0], X_kpca[y==1, 1], color='blue', marker='^', alpha=0.5) plt.title('First 2 principal components after RBF Kernel PCA') plt.xlabel('PC1') plt.ylabel('PC2') plt.show() plt.scatter(X_kpca[y==0, 0], np.zeros((500, 1)), color='red', marker='o', alpha=0.5) plt.scatter(X_kpca[y==1, 0], np.zeros((500, 1)), color='blue', marker='^', alpha=0.5) plt.title('First principal component after RBF Kernel PCA') plt.xlabel('PC1') plt.yticks([]) plt.show() API RBFKernelPCA(gamma=15.0, n_components=None, copy_X=True) RBF Kernel Principal Component Analysis for dimensionality reduction. Parameters gamma : float (default: 15.0) Free parameter (coefficient) of the RBF kernel. n_components : int (default: None) The number of principal components for transformation. Keeps the original dimensions of the dataset if None . copy_X : bool (default: True) Copies training data, which is required to compute the projection of new data via the transform method. Uses a reference to X if False. Attributes e_vals_ : array-like, shape=[n_features] Eigenvalues in sorted order. e_vecs_ : array-like, shape=[n_features] Eigenvectors in sorted order. X_projected_ : array-like, shape=[n_samples, n_components] Training samples projected along the component axes. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/feature_extraction/RBFKernelPCA/ Methods fit(X) Learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns self : object transform(X) Apply the non-linear transformation on X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns X_projected : np.ndarray, shape = [n_samples, n_components] Projected training vectors.","title":"RBFKernelPCA"},{"location":"user_guide/feature_extraction/RBFKernelPCA/#rbf-kernel-principal-component-analysis","text":"Implementation of RBF Kernel Principal Component Analysis for non-linear dimensionality reduction from mlxtend.feature_extraction import RBFKernelPCA","title":"RBF Kernel Principal Component Analysis"},{"location":"user_guide/feature_extraction/RBFKernelPCA/#overview","text":"Most machine learning algorithms have been developed and statistically validated for linearly separable data. Popular examples are linear classifiers like Support Vector Machines (SVMs) or the (standard) Principal Component Analysis (PCA) for dimensionality reduction. However, most real world data requires nonlinear methods in order to perform tasks that involve the analysis and discovery of patterns successfully. The focus of this overview is to briefly introduce the idea of kernel methods and to implement a Gaussian radius basis function (RBF) kernel that is used to perform nonlinear dimensionality reduction via BF kernel principal component analysis (kPCA).","title":"Overview"},{"location":"user_guide/feature_extraction/RBFKernelPCA/#principal-component-analysis","text":"The main purpose of principal component analysis (PCA) is the analysis of data to identify patterns that represent the data \u201cwell.\u201d The principal components can be understood as new axes of the dataset that maximize the variance along those axes (the eigenvectors of the covariance matrix). In other words, PCA aims to find the axes with maximum variances along which the data is most spread. For more details, please see the related article on mlxtend.feature_extraction.PrincipalComponentAnalysis .","title":"Principal Component Analysis"},{"location":"user_guide/feature_extraction/RBFKernelPCA/#nonlinear-dimensionality-reduction","text":"The \u201cclassic\u201d PCA approach described above is a linear projection technique that works well if the data is linearly separable. However, in the case of linearly inseparable data, a nonlinear technique is required if the task is to reduce the dimensionality of a dataset.","title":"Nonlinear dimensionality reduction"},{"location":"user_guide/feature_extraction/RBFKernelPCA/#kernel-functions-and-the-kernel-trick","text":"The basic idea to deal with linearly inseparable data is to project it onto a higher dimensional space where it becomes linearly separable. Let us call this nonlinear mapping function \\phi so that the mapping of a sample \\mathbf{x} can be written as \\mathbf{x} \\rightarrow \\phi (\\mathbf{x}) , which is called \"kernel function.\" Now, the term \"kernel\" describes a function that calculates the dot product of the images of the samples \\mathbf{x} under \\phi . \\kappa(\\mathbf{x_i, x_j}) = \\phi (\\mathbf{x_i}) \\phi (\\mathbf{x_j})^T More details about the derivation of this equation are provided in this excellent review article by Quan Wang: Kernel Principal Component Analysis and its Applications in Face Recognition and Active Shape Models .[ 1 ] In other words, the function \\phi maps the original d-dimensional features into a larger, k-dimensional feature space by creating nononlinear combinations of the original features. For example, if \\mathbf{x} consists of 2 features: \\mathbf{x} = \\big[x_1 \\quad x_2\\big]^T \\quad \\quad \\mathbf{x} \\in I\\!R^d \\Downarrow \\phi \\mathbf{x}' = \\big[x_1 \\quad x_2 \\quad x_1 x_2 \\quad x_{1}^2 \\quad x_1 x_{2}^3 \\quad \\dots \\big]^T \\quad \\quad \\mathbf{x} \\in I\\!R^k (k >> d) Often, the mathematical definition of the RBF kernel is written and implemented as \\kappa(\\mathbf{x_i, x_j}) = exp\\bigg(- \\gamma \\; \\lVert\\mathbf{x_i - x_j }\\rVert^{2}_{2} \\bigg) where \\textstyle\\gamma = \\tfrac{1}{2\\sigma^2} is a free parameter that is to be optimized.","title":"Kernel functions and the kernel trick"},{"location":"user_guide/feature_extraction/RBFKernelPCA/#gaussian-radial-basis-function-rbf-kernel-pca","text":"In the linear PCA approach, we are interested in the principal components that maximize the variance in the dataset. This is done by extracting the eigenvectors (principle components) that correspond to the largest eigenvalues based on the covariance matrix: \\text{Cov} = \\frac{1}{N} \\sum_{i=1}^{N} \\mathbf{x_i} \\mathbf{x_i}^T Bernhard Scholkopf ( Kernel Principal Component Analysis [ 2 ]) generalized this approach for data that was mapped onto the higher dimensional space via a kernel function: \\text{Cov} = \\frac{1}{N} \\sum_{i=1}^{N} \\phi(\\mathbf{x_i}) \\phi(\\mathbf{x_i})^T However, in practice the the covariance matrix in the higher dimensional space is not calculated explicitly (kernel trick). Therefore, the implementation of RBF kernel PCA does not yield the principal component axes (in contrast to the standard PCA), but the obtained eigenvectors can be understood as projections of the data onto the principal components.","title":"Gaussian radial basis function (RBF) Kernel PCA"},{"location":"user_guide/feature_extraction/RBFKernelPCA/#rbf-kernel-pca-step-by-step","text":"","title":"RBF kernel PCA step-by-step"},{"location":"user_guide/feature_extraction/RBFKernelPCA/#1-computation-of-the-kernel-similarity-matrix","text":"In this first step, we need to calculate \\kappa(\\mathbf{x_i, x_j}) = exp\\bigg(- \\gamma \\; \\lVert\\mathbf{x_i - x_j }\\rVert^{2}_{2} \\bigg) for every pair of points. E.g., if we have a dataset of 100 samples, this step would result in a symmetric 100x100 kernel matrix.","title":"1. Computation of the kernel (similarity) matrix."},{"location":"user_guide/feature_extraction/RBFKernelPCA/#2-eigendecomposition-of-the-kernel-matrix","text":"Since it is not guaranteed that the kernel matrix is centered, we can apply the following equation to do so: K' = K - \\mathbf{1_N} K - K \\mathbf{1_N} + \\mathbf{1_N} K \\mathbf{1_N} where \\mathbf{1_N} is (like the kernel matrix) a N\\times N matrix with all values equal to \\frac{1}{N} . [ 3 ] Now, we have to obtain the eigenvectors of the centered kernel matrix that correspond to the largest eigenvalues. Those eigenvectors are the data points already projected onto the respective principal components.","title":"2. Eigendecomposition of the kernel matrix."},{"location":"user_guide/feature_extraction/RBFKernelPCA/#projecting-new-data","text":"So far, so good, in the sections above, we have been projecting an dataset onto a new feature subspace. However, in a real application, we are usually interested in mapping new data points onto the same new feature subspace (e.g., if are working with a training and a test dataset in pattern classification tasks). Remember, when we computed the eigenvectors \\mathbf{\\alpha} of the centered kernel matrix, those values were actually already the projected datapoints onto the principal component axis \\mathbf{g} . If we want to project a new data point \\mathbf{x} onto this principal component axis, we'd need to compute \\phi(\\mathbf{x})^T \\mathbf{g} . Fortunately, also here, we don't have to compute \\phi(\\mathbf{x})^T \\mathbf{g} explicitely but use the kernel trick to calculate the RBF kernel between the new data point and every data point j in the training dataset: \\phi(\\mathbf{x})^T \\mathbf{g} = \\sum_j \\alpha_{i} \\; \\phi(\\mathbf{x}) \\; \\phi(\\mathbf{x_j})^T = \\sum_j \\alpha_{i} \\; \\kappa(\\mathbf{x}, \\mathbf{x_j}) and the eigenvectors \\alpha and eigenvalues \\lambda of the Kernel matrix \\mathbf{K} satisfy the equation \\mathbf{K} \\alpha = \\lambda \\alpha , we just need to normalize the eigenvector by the corresponding eigenvalue.","title":"Projecting new data"},{"location":"user_guide/feature_extraction/RBFKernelPCA/#references","text":"[1] Q. Wang. Kernel principal component analysis and its applications in face recognition and active shape models . CoRR, abs/1207.3538, 2012. [2] B. Scholkopf, A. Smola, and K.-R. Muller. Kernel principal component analysis . pages 583\u2013588, 1997. [3] B. Scholkopf, A. Smola, and K.-R. Muller. Nonlinear component analysis as a kernel eigenvalue problem . Neural computation, 10(5):1299\u20131319, 1998.","title":"References"},{"location":"user_guide/feature_extraction/RBFKernelPCA/#example-1-half-moon-shapes","text":"We will start with a simple example of 2 half-moon shapes generated by the make_moons function from scikit-learn. import matplotlib.pyplot as plt from sklearn.datasets import make_moons X, y = make_moons(n_samples=50, random_state=1) plt.scatter(X[y==0, 0], X[y==0, 1], color='red', marker='o', alpha=0.5) plt.scatter(X[y==1, 0], X[y==1, 1], color='blue', marker='^', alpha=0.5) plt.ylabel('y coordinate') plt.xlabel('x coordinate') plt.show() Since the two half-moon shapes are linearly inseparable, we expect that the \u201cclassic\u201d PCA will fail to give us a \u201cgood\u201d representation of the data in 1D space. Let us use PCA class to perform the dimensionality reduction. from mlxtend.feature_extraction import PrincipalComponentAnalysis as PCA pca = PCA(n_components=2) X_pca = pca.fit(X).transform(X) plt.scatter(X_pca[y==0, 0], X_pca[y==0, 1], color='red', marker='o', alpha=0.5) plt.scatter(X_pca[y==1, 0], X_pca[y==1, 1], color='blue', marker='^', alpha=0.5) plt.xlabel('PC1') plt.ylabel('PC2') plt.show() As we can see, the resulting principal components do not yield a subspace where the data is linearly separated well. Note that PCA is a unsupervised method and does not \u201cconsider\u201d class labels in order to maximize the variance in contrast to Linear Discriminant Analysis. Here, the colors blue and red are just added for visualization purposes to indicate the degree of separation. Next, we will perform dimensionality reduction via RBF kernel PCA on our half-moon data. The choice of \\gamma depends on the dataset and can be obtained via hyperparameter tuning techniques like Grid Search. Hyperparameter tuning is a broad topic itself, and here I will just use a \\gamma -value that I found to produce \u201cgood\u201d results. from mlxtend.data import iris_data from mlxtend.preprocessing import standardize from mlxtend.feature_extraction import RBFKernelPCA as KPCA kpca = KPCA(gamma=15.0, n_components=2) kpca.fit(X) X_kpca = kpca.X_projected_ Please note that the components of kernel methods such as RBF kernel PCA already represent the projected data points (in contrast to PCA, where the component axis are the \"top k\" eigenvectors thar are used to contruct a projection matrix, which is then used to transform the training samples). Thus, the projected training set is available after fitting via the .X_projected_ attribute. plt.scatter(X_kpca[y==0, 0], X_kpca[y==0, 1], color='red', marker='o', alpha=0.5) plt.scatter(X_kpca[y==1, 0], X_kpca[y==1, 1], color='blue', marker='^', alpha=0.5) plt.title('First 2 principal components after RBF Kernel PCA') plt.xlabel('PC1') plt.ylabel('PC2') plt.show() The new feature space is linearly separable now. Since we are often interested in dimensionality reduction, let's have a look at the first component only. import numpy as np plt.scatter(X_kpca[y==0, 0], np.zeros((25, 1)), color='red', marker='o', alpha=0.5) plt.scatter(X_kpca[y==1, 0], np.zeros((25, 1)), color='blue', marker='^', alpha=0.5) plt.title('First principal component after RBF Kernel PCA') plt.xlabel('PC1') plt.yticks([]) plt.show() We can clearly see that the projection via RBF kernel PCA yielded a subspace where the classes are separated well. Such a subspace can then be used as input for generalized linear classification models, e.g., logistic regression.","title":"Example 1 - Half-moon shapes"},{"location":"user_guide/feature_extraction/RBFKernelPCA/#projecting-new-data_1","text":"Finally, via the transform method, we can project new data onto the new component axes. import matplotlib.pyplot as plt from sklearn.datasets import make_moons X2, y2 = make_moons(n_samples=200, random_state=5) X2_kpca = kpca.transform(X2) plt.scatter(X_kpca[y==0, 0], X_kpca[y==0, 1], color='red', marker='o', alpha=0.5, label='fit data') plt.scatter(X_kpca[y==1, 0], X_kpca[y==1, 1], color='blue', marker='^', alpha=0.5, label='fit data') plt.scatter(X2_kpca[y2==0, 0], X2_kpca[y2==0, 1], color='orange', marker='v', alpha=0.2, label='new data') plt.scatter(X2_kpca[y2==1, 0], X2_kpca[y2==1, 1], color='cyan', marker='s', alpha=0.2, label='new data') plt.legend() plt.show()","title":"Projecting new data"},{"location":"user_guide/feature_extraction/RBFKernelPCA/#example-2-concentric-circles","text":"Following the concepts explained in example 1, let's have a look at another classic case: 2 concentric circles with random noise produced by scikit-learn\u2019s make_circles . from sklearn.datasets import make_circles X, y = make_circles(n_samples=1000, random_state=123, noise=0.1, factor=0.2) plt.figure(figsize=(8,6)) plt.scatter(X[y==0, 0], X[y==0, 1], color='red', alpha=0.5) plt.scatter(X[y==1, 0], X[y==1, 1], color='blue', alpha=0.5) plt.title('Concentric circles') plt.ylabel('y coordinate') plt.xlabel('x coordinate') plt.show() from mlxtend.data import iris_data from mlxtend.preprocessing import standardize from mlxtend.feature_extraction import RBFKernelPCA as KPCA kpca = KPCA(gamma=15.0, n_components=2) kpca.fit(X) X_kpca = kpca.X_projected_ plt.scatter(X_kpca[y==0, 0], X_kpca[y==0, 1], color='red', marker='o', alpha=0.5) plt.scatter(X_kpca[y==1, 0], X_kpca[y==1, 1], color='blue', marker='^', alpha=0.5) plt.title('First 2 principal components after RBF Kernel PCA') plt.xlabel('PC1') plt.ylabel('PC2') plt.show() plt.scatter(X_kpca[y==0, 0], np.zeros((500, 1)), color='red', marker='o', alpha=0.5) plt.scatter(X_kpca[y==1, 0], np.zeros((500, 1)), color='blue', marker='^', alpha=0.5) plt.title('First principal component after RBF Kernel PCA') plt.xlabel('PC1') plt.yticks([]) plt.show()","title":"Example 2 - Concentric circles"},{"location":"user_guide/feature_extraction/RBFKernelPCA/#api","text":"RBFKernelPCA(gamma=15.0, n_components=None, copy_X=True) RBF Kernel Principal Component Analysis for dimensionality reduction. Parameters gamma : float (default: 15.0) Free parameter (coefficient) of the RBF kernel. n_components : int (default: None) The number of principal components for transformation. Keeps the original dimensions of the dataset if None . copy_X : bool (default: True) Copies training data, which is required to compute the projection of new data via the transform method. Uses a reference to X if False. Attributes e_vals_ : array-like, shape=[n_features] Eigenvalues in sorted order. e_vecs_ : array-like, shape=[n_features] Eigenvectors in sorted order. X_projected_ : array-like, shape=[n_samples, n_components] Training samples projected along the component axes. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/feature_extraction/RBFKernelPCA/","title":"API"},{"location":"user_guide/feature_extraction/RBFKernelPCA/#methods","text":"fit(X) Learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns self : object transform(X) Apply the non-linear transformation on X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. Returns X_projected : np.ndarray, shape = [n_samples, n_components] Projected training vectors.","title":"Methods"},{"location":"user_guide/feature_selection/ColumnSelector/","text":"ColumnSelector Implementation of a column selector class for scikit-learn pipelines. from mlxtend.feature_selection import ColumnSelector Overview The ColumnSelector can be used for \"manual\" feature selection, e.g., as part of a grid search via a scikit-learn pipeline. References - Example 1 - Fitting an Estimator on a Feature Subset Load a simple benchmark dataset: from sklearn.datasets import load_iris iris = load_iris() X = iris.data y = iris.target The ColumnSelector is a simple transformer class that selects specific columns (features) from a datast. For instance, using the transform method returns a reduced dataset that only contains two features (here: the first two features via the indices 0 and 1, respectively): from mlxtend.feature_selection import ColumnSelector col_selector = ColumnSelector(cols=(0, 1)) # col_selector.fit(X) # optional, does not do anything col_selector.transform(X).shape (150, 2) ColumnSelector works both with numpy arrays and pandas dataframes: import pandas as pd iris_df = pd.DataFrame(iris.data, columns=iris.feature_names) iris_df.head() .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) 0 5.1 3.5 1.4 0.2 1 4.9 3.0 1.4 0.2 2 4.7 3.2 1.3 0.2 3 4.6 3.1 1.5 0.2 4 5.0 3.6 1.4 0.2 col_selector = ColumnSelector(cols=(\"sepal length (cm)\", \"sepal width (cm)\")) col_selector.transform(iris_df).shape (150, 2) Similarly, we can use the ColumnSelector as part of a scikit-learn Pipeline : from sklearn.neighbors import KNeighborsClassifier from sklearn.preprocessing import StandardScaler from sklearn.pipeline import make_pipeline pipe = make_pipeline(StandardScaler(), ColumnSelector(cols=(0, 1)), KNeighborsClassifier()) pipe.fit(X, y) pipe.score(X, y) 0.83999999999999997 Example 2 - Feature Selection via GridSearch Example 1 showed a simple useage example of the ColumnSelector ; however, selecting columns from a dataset is trivial and does not require a specific transformer class since we could have achieved the same results via classifier.fit(X[:, :2], y) classifier.score(X[:, :2], y) However, the ColumnSelector becomes really useful for feature selection as part of a grid search as shown in this example. Load a simple benchmark dataset: from sklearn.datasets import load_iris iris = load_iris() X = iris.data y = iris.target Create all possible combinations: from itertools import combinations all_comb = [] for size in range(1, 5): all_comb += list(combinations(range(X.shape[1]), r=size)) print(all_comb) [(0,), (1,), (2,), (3,), (0, 1), (0, 2), (0, 3), (1, 2), (1, 3), (2, 3), (0, 1, 2), (0, 1, 3), (0, 2, 3), (1, 2, 3), (0, 1, 2, 3)] Feature and model selection via grid search: from mlxtend.feature_selection import ColumnSelector from sklearn.neighbors import KNeighborsClassifier from sklearn.preprocessing import StandardScaler from sklearn.model_selection import GridSearchCV from sklearn.pipeline import make_pipeline pipe = make_pipeline(StandardScaler(), ColumnSelector(), KNeighborsClassifier()) param_grid = {'columnselector__cols': all_comb, 'kneighborsclassifier__n_neighbors': list(range(1, 11))} grid = GridSearchCV(pipe, param_grid, cv=5, n_jobs=-1) grid.fit(X, y) print('Best parameters:', grid.best_params_) print('Best performance:', grid.best_score_) Best parameters: {'columnselector__cols': (2, 3), 'kneighborsclassifier__n_neighbors': 1} Best performance: 0.98 API ColumnSelector(cols=None, drop_axis=False) Object for selecting specific columns from a data set. Parameters cols : array-like (default: None) A list specifying the feature indices to be selected. For example, [1, 4, 5] to select the 2nd, 5th, and 6th feature columns. If None, returns all columns in the array. drop_axis : bool (default=False) Drops last axis if True and the only one column is selected. This is useful, e.g., when the ColumnSelector is used for selecting only one column and the resulting array should be fed to e.g., a scikit-learn column selector. E.g., instead of returning an array with shape (n_samples, 1), drop_axis=True will return an aray with shape (n_samples,). Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/feature_selection/ColumnSelector/ Methods fit(X, y=None) Mock method. Does nothing. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] (default: None) Returns self fit_transform(X, y=None) Return a slice of the input array. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] (default: None) Returns X_slice : shape = [n_samples, k_features] Subset of the feature space where k_features <= n_features get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self transform(X, y=None) Return a slice of the input array. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] (default: None) Returns X_slice : shape = [n_samples, k_features] Subset of the feature space where k_features <= n_features","title":"ColumnSelector"},{"location":"user_guide/feature_selection/ColumnSelector/#columnselector","text":"Implementation of a column selector class for scikit-learn pipelines. from mlxtend.feature_selection import ColumnSelector","title":"ColumnSelector"},{"location":"user_guide/feature_selection/ColumnSelector/#overview","text":"The ColumnSelector can be used for \"manual\" feature selection, e.g., as part of a grid search via a scikit-learn pipeline.","title":"Overview"},{"location":"user_guide/feature_selection/ColumnSelector/#references","text":"-","title":"References"},{"location":"user_guide/feature_selection/ColumnSelector/#example-1-fitting-an-estimator-on-a-feature-subset","text":"Load a simple benchmark dataset: from sklearn.datasets import load_iris iris = load_iris() X = iris.data y = iris.target The ColumnSelector is a simple transformer class that selects specific columns (features) from a datast. For instance, using the transform method returns a reduced dataset that only contains two features (here: the first two features via the indices 0 and 1, respectively): from mlxtend.feature_selection import ColumnSelector col_selector = ColumnSelector(cols=(0, 1)) # col_selector.fit(X) # optional, does not do anything col_selector.transform(X).shape (150, 2) ColumnSelector works both with numpy arrays and pandas dataframes: import pandas as pd iris_df = pd.DataFrame(iris.data, columns=iris.feature_names) iris_df.head() .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) 0 5.1 3.5 1.4 0.2 1 4.9 3.0 1.4 0.2 2 4.7 3.2 1.3 0.2 3 4.6 3.1 1.5 0.2 4 5.0 3.6 1.4 0.2 col_selector = ColumnSelector(cols=(\"sepal length (cm)\", \"sepal width (cm)\")) col_selector.transform(iris_df).shape (150, 2) Similarly, we can use the ColumnSelector as part of a scikit-learn Pipeline : from sklearn.neighbors import KNeighborsClassifier from sklearn.preprocessing import StandardScaler from sklearn.pipeline import make_pipeline pipe = make_pipeline(StandardScaler(), ColumnSelector(cols=(0, 1)), KNeighborsClassifier()) pipe.fit(X, y) pipe.score(X, y) 0.83999999999999997","title":"Example 1 - Fitting an Estimator on a Feature Subset"},{"location":"user_guide/feature_selection/ColumnSelector/#example-2-feature-selection-via-gridsearch","text":"Example 1 showed a simple useage example of the ColumnSelector ; however, selecting columns from a dataset is trivial and does not require a specific transformer class since we could have achieved the same results via classifier.fit(X[:, :2], y) classifier.score(X[:, :2], y) However, the ColumnSelector becomes really useful for feature selection as part of a grid search as shown in this example. Load a simple benchmark dataset: from sklearn.datasets import load_iris iris = load_iris() X = iris.data y = iris.target Create all possible combinations: from itertools import combinations all_comb = [] for size in range(1, 5): all_comb += list(combinations(range(X.shape[1]), r=size)) print(all_comb) [(0,), (1,), (2,), (3,), (0, 1), (0, 2), (0, 3), (1, 2), (1, 3), (2, 3), (0, 1, 2), (0, 1, 3), (0, 2, 3), (1, 2, 3), (0, 1, 2, 3)] Feature and model selection via grid search: from mlxtend.feature_selection import ColumnSelector from sklearn.neighbors import KNeighborsClassifier from sklearn.preprocessing import StandardScaler from sklearn.model_selection import GridSearchCV from sklearn.pipeline import make_pipeline pipe = make_pipeline(StandardScaler(), ColumnSelector(), KNeighborsClassifier()) param_grid = {'columnselector__cols': all_comb, 'kneighborsclassifier__n_neighbors': list(range(1, 11))} grid = GridSearchCV(pipe, param_grid, cv=5, n_jobs=-1) grid.fit(X, y) print('Best parameters:', grid.best_params_) print('Best performance:', grid.best_score_) Best parameters: {'columnselector__cols': (2, 3), 'kneighborsclassifier__n_neighbors': 1} Best performance: 0.98","title":"Example 2 - Feature Selection via GridSearch"},{"location":"user_guide/feature_selection/ColumnSelector/#api","text":"ColumnSelector(cols=None, drop_axis=False) Object for selecting specific columns from a data set. Parameters cols : array-like (default: None) A list specifying the feature indices to be selected. For example, [1, 4, 5] to select the 2nd, 5th, and 6th feature columns. If None, returns all columns in the array. drop_axis : bool (default=False) Drops last axis if True and the only one column is selected. This is useful, e.g., when the ColumnSelector is used for selecting only one column and the resulting array should be fed to e.g., a scikit-learn column selector. E.g., instead of returning an array with shape (n_samples, 1), drop_axis=True will return an aray with shape (n_samples,). Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/feature_selection/ColumnSelector/","title":"API"},{"location":"user_guide/feature_selection/ColumnSelector/#methods","text":"fit(X, y=None) Mock method. Does nothing. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] (default: None) Returns self fit_transform(X, y=None) Return a slice of the input array. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] (default: None) Returns X_slice : shape = [n_samples, k_features] Subset of the feature space where k_features <= n_features get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self transform(X, y=None) Return a slice of the input array. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] (default: None) Returns X_slice : shape = [n_samples, k_features] Subset of the feature space where k_features <= n_features","title":"Methods"},{"location":"user_guide/feature_selection/ExhaustiveFeatureSelector/","text":"Exhaustive Feature Selector Implementation of an exhaustive feature selector for sampling and evaluating all possible feature combinations in a specified range. from mlxtend.feature_selection import ExhaustiveFeatureSelector Overview This exhaustive feature selection algorithm is a wrapper approach for brute-force evaluation of feature subsets; the best subset is selected by optimizing a specified performance metric given an arbitrary regressor or classifier. For instance, if the classifier is a logistic regression and the dataset consists of 4 features, the alogorithm will evaluate all 15 feature combinations (if min_features=1 and max_features=4 ) {0} {1} {2} {3} {0, 1} {0, 2} {0, 3} {1, 2} {1, 3} {2, 3} {0, 1, 2} {0, 1, 3} {0, 2, 3} {1, 2, 3} {0, 1, 2, 3} and select the one that results in the best performance (e.g., classification accuracy) of the logistic regression classifier. Example 1 - A simple Iris Example Initializing a simple classifier from scikit-learn: from sklearn.neighbors import KNeighborsClassifier from sklearn.datasets import load_iris from mlxtend.feature_selection import ExhaustiveFeatureSelector as EFS iris = load_iris() X = iris.data y = iris.target knn = KNeighborsClassifier(n_neighbors=3) efs1 = EFS(knn, min_features=1, max_features=4, scoring='accuracy', print_progress=True, cv=5) efs1 = efs1.fit(X, y) print('Best accuracy score: %.2f' % efs1.best_score_) print('Best subset (indices):', efs1.best_idx_) print('Best subset (corresponding names):', efs1.best_feature_names_) Features: 15/15 Best accuracy score: 0.97 Best subset (indices): (0, 2, 3) Best subset (corresponding names): ('0', '2', '3') Note that in the example above, the 'best_feature_names_' are simply a string equivalent of the feature indices. However, we can provide custom feature names to the fit function for this mapping: feature_names = ('sepal length', 'sepal width', 'petal length', 'petal width') efs1 = efs1.fit(X, y, custom_feature_names=feature_names) print('Best subset (corresponding names):', efs1.best_feature_names_) Features: 15/15 Best subset (corresponding names): ('sepal length', 'petal length', 'petal width') Via the subsets_ attribute, we can take a look at the selected feature indices at each step: efs1.subsets_ {0: {'avg_score': 0.65999999999999992, 'cv_scores': array([ 0.53333333, 0.63333333, 0.73333333, 0.76666667, 0.63333333]), 'feature_idx': (0,), 'feature_names': ('sepal length',)}, 1: {'avg_score': 0.56666666666666665, 'cv_scores': array([ 0.53333333, 0.63333333, 0.6 , 0.5 , 0.56666667]), 'feature_idx': (1,), 'feature_names': ('sepal width',)}, 2: {'avg_score': 0.95333333333333337, 'cv_scores': array([ 0.93333333, 1. , 0.9 , 0.93333333, 1. ]), 'feature_idx': (2,), 'feature_names': ('petal length',)}, 3: {'avg_score': 0.94666666666666666, 'cv_scores': array([ 0.96666667, 0.96666667, 0.93333333, 0.86666667, 1. ]), 'feature_idx': (3,), 'feature_names': ('petal width',)}, 4: {'avg_score': 0.72666666666666668, 'cv_scores': array([ 0.66666667, 0.8 , 0.63333333, 0.86666667, 0.66666667]), 'feature_idx': (0, 1), 'feature_names': ('sepal length', 'sepal width')}, 5: {'avg_score': 0.94666666666666666, 'cv_scores': array([ 0.96666667, 1. , 0.86666667, 0.93333333, 0.96666667]), 'feature_idx': (0, 2), 'feature_names': ('sepal length', 'petal length')}, 6: {'avg_score': 0.95333333333333337, 'cv_scores': array([ 0.96666667, 0.96666667, 0.9 , 0.93333333, 1. ]), 'feature_idx': (0, 3), 'feature_names': ('sepal length', 'petal width')}, 7: {'avg_score': 0.94666666666666666, 'cv_scores': array([ 0.96666667, 1. , 0.9 , 0.93333333, 0.93333333]), 'feature_idx': (1, 2), 'feature_names': ('sepal width', 'petal length')}, 8: {'avg_score': 0.94000000000000006, 'cv_scores': array([ 0.96666667, 0.96666667, 0.86666667, 0.93333333, 0.96666667]), 'feature_idx': (1, 3), 'feature_names': ('sepal width', 'petal width')}, 9: {'avg_score': 0.95333333333333337, 'cv_scores': array([ 0.96666667, 0.96666667, 0.9 , 0.93333333, 1. ]), 'feature_idx': (2, 3), 'feature_names': ('petal length', 'petal width')}, 10: {'avg_score': 0.94000000000000006, 'cv_scores': array([ 0.96666667, 0.96666667, 0.86666667, 0.93333333, 0.96666667]), 'feature_idx': (0, 1, 2), 'feature_names': ('sepal length', 'sepal width', 'petal length')}, 11: {'avg_score': 0.94666666666666666, 'cv_scores': array([ 0.93333333, 0.96666667, 0.9 , 0.93333333, 1. ]), 'feature_idx': (0, 1, 3), 'feature_names': ('sepal length', 'sepal width', 'petal width')}, 12: {'avg_score': 0.97333333333333338, 'cv_scores': array([ 0.96666667, 0.96666667, 0.96666667, 0.96666667, 1. ]), 'feature_idx': (0, 2, 3), 'feature_names': ('sepal length', 'petal length', 'petal width')}, 13: {'avg_score': 0.95999999999999996, 'cv_scores': array([ 0.96666667, 0.96666667, 0.93333333, 0.93333333, 1. ]), 'feature_idx': (1, 2, 3), 'feature_names': ('sepal width', 'petal length', 'petal width')}, 14: {'avg_score': 0.96666666666666679, 'cv_scores': array([ 0.96666667, 0.96666667, 0.93333333, 0.96666667, 1. ]), 'feature_idx': (0, 1, 2, 3), 'feature_names': ('sepal length', 'sepal width', 'petal length', 'petal width')}} Example 2 - Visualizing the feature selection results For our convenience, we can visualize the output from the feature selection in a pandas DataFrame format using the get_metric_dict method of the ExhaustiveFeatureSelector object. The columns std_dev and std_err represent the standard deviation and standard errors of the cross-validation scores, respectively. Below, we see the DataFrame of the Sequential Forward Selector from Example 2: import pandas as pd iris = load_iris() X = iris.data y = iris.target knn = KNeighborsClassifier(n_neighbors=3) efs1 = EFS(knn, min_features=1, max_features=4, scoring='accuracy', print_progress=True, cv=5) feature_names = ('sepal length', 'sepal width', 'petal length', 'petal width') efs1 = efs1.fit(X, y, custom_feature_names=feature_names) df = pd.DataFrame.from_dict(efs1.get_metric_dict()).T df.sort_values('avg_score', inplace=True, ascending=False) df Features: 15/15 .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } avg_score ci_bound cv_scores feature_idx feature_names std_dev std_err 12 0.973333 0.0171372 [0.966666666667, 0.966666666667, 0.96666666666... (0, 2, 3) (sepal length, petal length, petal width) 0.0133333 0.00666667 14 0.966667 0.0270963 [0.966666666667, 0.966666666667, 0.93333333333... (0, 1, 2, 3) (sepal length, sepal width, petal length, peta... 0.0210819 0.0105409 13 0.96 0.0320608 [0.966666666667, 0.966666666667, 0.93333333333... (1, 2, 3) (sepal width, petal length, petal width) 0.0249444 0.0124722 2 0.953333 0.0514116 [0.933333333333, 1.0, 0.9, 0.933333333333, 1.0] (2,) (petal length,) 0.04 0.02 6 0.953333 0.0436915 [0.966666666667, 0.966666666667, 0.9, 0.933333... (0, 3) (sepal length, petal width) 0.0339935 0.0169967 9 0.953333 0.0436915 [0.966666666667, 0.966666666667, 0.9, 0.933333... (2, 3) (petal length, petal width) 0.0339935 0.0169967 3 0.946667 0.0581151 [0.966666666667, 0.966666666667, 0.93333333333... (3,) (petal width,) 0.0452155 0.0226078 5 0.946667 0.0581151 [0.966666666667, 1.0, 0.866666666667, 0.933333... (0, 2) (sepal length, petal length) 0.0452155 0.0226078 7 0.946667 0.0436915 [0.966666666667, 1.0, 0.9, 0.933333333333, 0.9... (1, 2) (sepal width, petal length) 0.0339935 0.0169967 11 0.946667 0.0436915 [0.933333333333, 0.966666666667, 0.9, 0.933333... (0, 1, 3) (sepal length, sepal width, petal width) 0.0339935 0.0169967 8 0.94 0.0499631 [0.966666666667, 0.966666666667, 0.86666666666... (1, 3) (sepal width, petal width) 0.038873 0.0194365 10 0.94 0.0499631 [0.966666666667, 0.966666666667, 0.86666666666... (0, 1, 2) (sepal length, sepal width, petal length) 0.038873 0.0194365 4 0.726667 0.11623 [0.666666666667, 0.8, 0.633333333333, 0.866666... (0, 1) (sepal length, sepal width) 0.0904311 0.0452155 0 0.66 0.106334 [0.533333333333, 0.633333333333, 0.73333333333... (0,) (sepal length,) 0.0827312 0.0413656 1 0.566667 0.0605892 [0.533333333333, 0.633333333333, 0.6, 0.5, 0.5... (1,) (sepal width,) 0.0471405 0.0235702 import matplotlib.pyplot as plt metric_dict = efs1.get_metric_dict() fig = plt.figure() k_feat = sorted(metric_dict.keys()) avg = [metric_dict[k]['avg_score'] for k in k_feat] upper, lower = [], [] for k in k_feat: upper.append(metric_dict[k]['avg_score'] + metric_dict[k]['std_dev']) lower.append(metric_dict[k]['avg_score'] - metric_dict[k]['std_dev']) plt.fill_between(k_feat, upper, lower, alpha=0.2, color='blue', lw=1) plt.plot(k_feat, avg, color='blue', marker='o') plt.ylabel('Accuracy +/- Standard Deviation') plt.xlabel('Number of Features') feature_min = len(metric_dict[k_feat[0]]['feature_idx']) feature_max = len(metric_dict[k_feat[-1]]['feature_idx']) plt.xticks(k_feat, [str(metric_dict[k]['feature_names']) for k in k_feat], rotation=90) plt.show() Example 3 - Exhaustive Feature Selection for Regression Similar to the classification examples above, the SequentialFeatureSelector also supports scikit-learn's estimators for regression. from sklearn.linear_model import LinearRegression from sklearn.datasets import load_boston boston = load_boston() X, y = boston.data, boston.target lr = LinearRegression() efs = EFS(lr, min_features=10, max_features=12, scoring='neg_mean_squared_error', cv=10) efs.fit(X, y) print('Best MSE score: %.2f' % efs.best_score_ * (-1)) print('Best subset:', efs.best_idx_) Features: 377/377 Best subset: (0, 1, 4, 6, 7, 8, 9, 10, 11, 12) Example 4 - Using the Selected Feature Subset For Making New Predictions # Initialize the dataset from sklearn.neighbors import KNeighborsClassifier from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split iris = load_iris() X, y = iris.data, iris.target X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.33, random_state=1) knn = KNeighborsClassifier(n_neighbors=3) # Select the \"best\" three features via # 5-fold cross-validation on the training set. from mlxtend.feature_selection import ExhaustiveFeatureSelector as EFS efs1 = EFS(knn, min_features=1, max_features=4, scoring='accuracy', cv=5) efs1 = efs1.fit(X_train, y_train) Features: 15/15 print('Selected features:', efs1.best_idx_) Selected features: (2, 3) # Generate the new subsets based on the selected features # Note that the transform call is equivalent to # X_train[:, efs1.k_feature_idx_] X_train_efs = efs1.transform(X_train) X_test_efs = efs1.transform(X_test) # Fit the estimator using the new feature subset # and make a prediction on the test data knn.fit(X_train_efs, y_train) y_pred = knn.predict(X_test_efs) # Compute the accuracy of the prediction acc = float((y_test == y_pred).sum()) / y_pred.shape[0] print('Test set accuracy: %.2f %%' % (acc*100)) Test set accuracy: 96.00 % Example 5 - Exhaustive Feature Selection and GridSearch # Initialize the dataset from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split iris = load_iris() X, y = iris.data, iris.target X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.33, random_state=1) Use scikit-learn's GridSearch to tune the hyperparameters of the LogisticRegression estimator inside the ExhaustiveFeatureSelector and use it for prediction in the pipeline. Note that the clone_estimator attribute needs to be set to False . from sklearn.model_selection import GridSearchCV from sklearn.pipeline import make_pipeline from sklearn.linear_model import LogisticRegression from mlxtend.feature_selection import ExhaustiveFeatureSelector as EFS lr = LogisticRegression(multi_class='multinomial', solver='lbfgs', random_state=123) efs1 = EFS(estimator=lr, min_features=2, max_features=3, scoring='accuracy', print_progress=False, clone_estimator=False, cv=5, n_jobs=1) pipe = make_pipeline(efs1, lr) param_grid = {'exhaustivefeatureselector__estimator__C': [0.1, 1.0, 10.0]} gs = GridSearchCV(estimator=pipe, param_grid=param_grid, scoring='accuracy', n_jobs=1, cv=2, verbose=1, refit=False) # run gridearch gs = gs.fit(X_train, y_train) Fitting 2 folds for each of 3 candidates, totalling 6 fits [Parallel(n_jobs=1)]: Done 6 out of 6 | elapsed: 2.7s finished ... and the \"best\" parameters determined by GridSearch are ... print(\"Best parameters via GridSearch\", gs.best_params_) Best parameters via GridSearch {'exhaustivefeatureselector__estimator__C': 1.0} Obtaining the best k feature indices after GridSearch If we are interested in the best k best feature indices via SequentialFeatureSelection.best_idx_ , we have to initialize a GridSearchCV object with refit=True . Now, the grid search object will take the complete training dataset and the best parameters, which it found via cross-validation, to train the estimator pipeline. gs = GridSearchCV(estimator=pipe, param_grid=param_grid, scoring='accuracy', n_jobs=1, cv=2, verbose=1, refit=True) After running the grid search, we can access the individual pipeline objects of the best_estimator_ via the steps attribute. gs = gs.fit(X_train, y_train) gs.best_estimator_.steps Fitting 2 folds for each of 3 candidates, totalling 6 fits [Parallel(n_jobs=1)]: Done 6 out of 6 | elapsed: 2.9s finished [('exhaustivefeatureselector', ExhaustiveFeatureSelector(clone_estimator=False, cv=5, estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True, intercept_scaling=1, max_iter=100, multi_class='multinomial', n_jobs=1, penalty='l2', random_state=123, solver='lbfgs', tol=0.0001, verbose=0, warm_start=False), max_features=3, min_features=2, n_jobs=1, pre_dispatch='2*n_jobs', print_progress=False, scoring='accuracy')), ('logisticregression', LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True, intercept_scaling=1, max_iter=100, multi_class='multinomial', n_jobs=1, penalty='l2', random_state=123, solver='lbfgs', tol=0.0001, verbose=0, warm_start=False))] Via sub-indexing, we can then obtain the best-selected feature subset: print('Best features:', gs.best_estimator_.steps[0][1].best_idx_) Best features: (2, 3) During cross-validation, this feature combination had a CV accuracy of: print('Best score:', gs.best_score_) Best score: 0.97 gs.best_params_ {'exhaustivefeatureselector__estimator__C': 1.0} Alternatively , if we can set the \"best grid search parameters\" in our pipeline manually if we ran GridSearchCV with refit=False . It should yield the same results: pipe.set_params(**gs.best_params_).fit(X_train, y_train) print('Best features:', pipe.steps[0][1].best_idx_) Best features: (2, 3) Example 6 - Working with pandas DataFrames Optionally, we can also use pandas DataFrames and pandas Series as input to the fit function. In this case, the column names of the pandas DataFrame will be used as feature names. However, note that if custom_feature_names are provided in the fit function, these custom_feature_names take precedence over the DataFrame column-based feature names. import pandas as pd from sklearn.neighbors import KNeighborsClassifier from sklearn.datasets import load_iris iris = load_iris() col_names = ('sepal length', 'sepal width', 'petal length', 'petal width') X_df = pd.DataFrame(iris.data, columns=col_names) y_series = pd.Series(iris.target) knn = KNeighborsClassifier(n_neighbors=4) from mlxtend.feature_selection import ExhaustiveFeatureSelector as EFS knn = KNeighborsClassifier(n_neighbors=3) efs1 = EFS(knn, min_features=1, max_features=4, scoring='accuracy', print_progress=True, cv=5) efs1 = efs1.fit(X_df, y_series) print('Best accuracy score: %.2f' % efs1.best_score_) print('Best subset (indices):', efs1.best_idx_) print('Best subset (corresponding names):', efs1.best_feature_names_) Features: 15/15 Best accuracy score: 0.97 Best subset (indices): (0, 2, 3) Best subset (corresponding names): ('sepal length', 'petal length', 'petal width') API ExhaustiveFeatureSelector(estimator, min_features=1, max_features=1, print_progress=True, scoring='accuracy', cv=5, n_jobs=1, pre_dispatch='2 n_jobs', clone_estimator=True)* Exhaustive Feature Selection for Classification and Regression. (new in v0.4.3) Parameters estimator : scikit-learn classifier or regressor min_features : int (default: 1) Minumum number of features to select max_features : int (default: 1) Maximum number of features to select print_progress : bool (default: True) Prints progress as the number of epochs to stderr. scoring : str, (default='accuracy') Scoring metric in {accuracy, f1, precision, recall, roc_auc} for classifiers, {'mean_absolute_error', 'mean_squared_error', 'median_absolute_error', 'r2'} for regressors, or a callable object or function with signature scorer(estimator, X, y) . cv : int (default: 5) Scikit-learn cross-validation generator or int . If estimator is a classifier (or y consists of integer class labels), stratified k-fold is performed, and regular k-fold cross-validation otherwise. No cross-validation if cv is None, False, or 0. n_jobs : int (default: 1) The number of CPUs to use for evaluating different feature subsets in parallel. -1 means 'all CPUs'. pre_dispatch : int, or string (default: '2*n_jobs') Controls the number of jobs that get dispatched during parallel execution if n_jobs > 1 or n_jobs=-1 . Reducing this number can be useful to avoid an explosion of memory consumption when more jobs get dispatched than CPUs can process. This parameter can be: None, in which case all the jobs are immediately created and spawned. Use this for lightweight and fast-running jobs, to avoid delays due to on-demand spawning of the jobs An int, giving the exact number of total jobs that are spawned A string, giving an expression as a function of n_jobs, as in 2*n_jobs clone_estimator : bool (default: True) Clones estimator if True; works with the original estimator instance if False. Set to False if the estimator doesn't implement scikit-learn's set_params and get_params methods. In addition, it is required to set cv=0, and n_jobs=1. Attributes best_idx_ : array-like, shape = [n_predictions] Feature Indices of the selected feature subsets. best_feature_names_ : array-like, shape = [n_predictions] Feature names of the selected feature subsets. If pandas DataFrames are used in the fit method, the feature names correspond to the column names. Otherwise, the feature names are string representation of the feature array indices. New in v 0.13.0. best_score_ : float Cross validation average score of the selected subset. subsets_ : dict A dictionary of selected feature subsets during the exhaustive selection, where the dictionary keys are the lengths k of these feature subsets. The dictionary values are dictionaries themselves with the following keys: 'feature_idx' (tuple of indices of the feature subset) 'feature_names' (tuple of feature names of the feat. subset) 'cv_scores' (list individual cross-validation scores) 'avg_score' (average cross-validation score) Note that if pandas DataFrames are used in the fit method, the 'feature_names' correspond to the column names. Otherwise, the feature names are string representation of the feature array indices. The 'feature_names' is new in v 0.13.0. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/feature_selection/ExhaustiveFeatureSelector/ Methods fit(X, y, custom_feature_names=None, fit_params) Perform feature selection and learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. New in v 0.13.0: pandas DataFrames are now also accepted as argument for X. y : array-like, shape = [n_samples] Target values. custom_feature_names : None or tuple (default: tuple) Custom feature names for self.k_feature_names and self.subsets_[i]['feature_names'] . (new in v 0.13.0) fit_params : dict of string -> object, optional Parameters to pass to to the fit method of classifier. Returns self : object fit_transform(X, y, fit_params) Fit to training data and return the best selected features from X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. New in v 0.13.0: pandas DataFrames are now also accepted as argument for X. y : array-like, shape = [n_samples] Target values. fit_params : dict of string -> object, optional Parameters to pass to to the fit method of classifier. Returns Feature subset of X, shape={n_samples, k_features} get_metric_dict(confidence_interval=0.95) Return metric dictionary Parameters confidence_interval : float (default: 0.95) A positive float between 0.0 and 1.0 to compute the confidence interval bounds of the CV score averages. Returns Dictionary with items where each dictionary value is a list with the number of iterations (number of feature subsets) as its length. The dictionary keys corresponding to these lists are as follows: 'feature_idx': tuple of the indices of the feature subset 'cv_scores': list with individual CV scores 'avg_score': of CV average scores 'std_dev': standard deviation of the CV score average 'std_err': standard error of the CV score average 'ci_bound': confidence interval bound of the CV score average get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self transform(X) Return the best selected features from X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. New in v 0.13.0: pandas DataFrames are now also accepted as argument for X. Returns Feature subset of X, shape={n_samples, k_features}","title":"Exhaustive Feature Selector"},{"location":"user_guide/feature_selection/ExhaustiveFeatureSelector/#exhaustive-feature-selector","text":"Implementation of an exhaustive feature selector for sampling and evaluating all possible feature combinations in a specified range. from mlxtend.feature_selection import ExhaustiveFeatureSelector","title":"Exhaustive Feature Selector"},{"location":"user_guide/feature_selection/ExhaustiveFeatureSelector/#overview","text":"This exhaustive feature selection algorithm is a wrapper approach for brute-force evaluation of feature subsets; the best subset is selected by optimizing a specified performance metric given an arbitrary regressor or classifier. For instance, if the classifier is a logistic regression and the dataset consists of 4 features, the alogorithm will evaluate all 15 feature combinations (if min_features=1 and max_features=4 ) {0} {1} {2} {3} {0, 1} {0, 2} {0, 3} {1, 2} {1, 3} {2, 3} {0, 1, 2} {0, 1, 3} {0, 2, 3} {1, 2, 3} {0, 1, 2, 3} and select the one that results in the best performance (e.g., classification accuracy) of the logistic regression classifier.","title":"Overview"},{"location":"user_guide/feature_selection/ExhaustiveFeatureSelector/#example-1-a-simple-iris-example","text":"Initializing a simple classifier from scikit-learn: from sklearn.neighbors import KNeighborsClassifier from sklearn.datasets import load_iris from mlxtend.feature_selection import ExhaustiveFeatureSelector as EFS iris = load_iris() X = iris.data y = iris.target knn = KNeighborsClassifier(n_neighbors=3) efs1 = EFS(knn, min_features=1, max_features=4, scoring='accuracy', print_progress=True, cv=5) efs1 = efs1.fit(X, y) print('Best accuracy score: %.2f' % efs1.best_score_) print('Best subset (indices):', efs1.best_idx_) print('Best subset (corresponding names):', efs1.best_feature_names_) Features: 15/15 Best accuracy score: 0.97 Best subset (indices): (0, 2, 3) Best subset (corresponding names): ('0', '2', '3') Note that in the example above, the 'best_feature_names_' are simply a string equivalent of the feature indices. However, we can provide custom feature names to the fit function for this mapping: feature_names = ('sepal length', 'sepal width', 'petal length', 'petal width') efs1 = efs1.fit(X, y, custom_feature_names=feature_names) print('Best subset (corresponding names):', efs1.best_feature_names_) Features: 15/15 Best subset (corresponding names): ('sepal length', 'petal length', 'petal width') Via the subsets_ attribute, we can take a look at the selected feature indices at each step: efs1.subsets_ {0: {'avg_score': 0.65999999999999992, 'cv_scores': array([ 0.53333333, 0.63333333, 0.73333333, 0.76666667, 0.63333333]), 'feature_idx': (0,), 'feature_names': ('sepal length',)}, 1: {'avg_score': 0.56666666666666665, 'cv_scores': array([ 0.53333333, 0.63333333, 0.6 , 0.5 , 0.56666667]), 'feature_idx': (1,), 'feature_names': ('sepal width',)}, 2: {'avg_score': 0.95333333333333337, 'cv_scores': array([ 0.93333333, 1. , 0.9 , 0.93333333, 1. ]), 'feature_idx': (2,), 'feature_names': ('petal length',)}, 3: {'avg_score': 0.94666666666666666, 'cv_scores': array([ 0.96666667, 0.96666667, 0.93333333, 0.86666667, 1. ]), 'feature_idx': (3,), 'feature_names': ('petal width',)}, 4: {'avg_score': 0.72666666666666668, 'cv_scores': array([ 0.66666667, 0.8 , 0.63333333, 0.86666667, 0.66666667]), 'feature_idx': (0, 1), 'feature_names': ('sepal length', 'sepal width')}, 5: {'avg_score': 0.94666666666666666, 'cv_scores': array([ 0.96666667, 1. , 0.86666667, 0.93333333, 0.96666667]), 'feature_idx': (0, 2), 'feature_names': ('sepal length', 'petal length')}, 6: {'avg_score': 0.95333333333333337, 'cv_scores': array([ 0.96666667, 0.96666667, 0.9 , 0.93333333, 1. ]), 'feature_idx': (0, 3), 'feature_names': ('sepal length', 'petal width')}, 7: {'avg_score': 0.94666666666666666, 'cv_scores': array([ 0.96666667, 1. , 0.9 , 0.93333333, 0.93333333]), 'feature_idx': (1, 2), 'feature_names': ('sepal width', 'petal length')}, 8: {'avg_score': 0.94000000000000006, 'cv_scores': array([ 0.96666667, 0.96666667, 0.86666667, 0.93333333, 0.96666667]), 'feature_idx': (1, 3), 'feature_names': ('sepal width', 'petal width')}, 9: {'avg_score': 0.95333333333333337, 'cv_scores': array([ 0.96666667, 0.96666667, 0.9 , 0.93333333, 1. ]), 'feature_idx': (2, 3), 'feature_names': ('petal length', 'petal width')}, 10: {'avg_score': 0.94000000000000006, 'cv_scores': array([ 0.96666667, 0.96666667, 0.86666667, 0.93333333, 0.96666667]), 'feature_idx': (0, 1, 2), 'feature_names': ('sepal length', 'sepal width', 'petal length')}, 11: {'avg_score': 0.94666666666666666, 'cv_scores': array([ 0.93333333, 0.96666667, 0.9 , 0.93333333, 1. ]), 'feature_idx': (0, 1, 3), 'feature_names': ('sepal length', 'sepal width', 'petal width')}, 12: {'avg_score': 0.97333333333333338, 'cv_scores': array([ 0.96666667, 0.96666667, 0.96666667, 0.96666667, 1. ]), 'feature_idx': (0, 2, 3), 'feature_names': ('sepal length', 'petal length', 'petal width')}, 13: {'avg_score': 0.95999999999999996, 'cv_scores': array([ 0.96666667, 0.96666667, 0.93333333, 0.93333333, 1. ]), 'feature_idx': (1, 2, 3), 'feature_names': ('sepal width', 'petal length', 'petal width')}, 14: {'avg_score': 0.96666666666666679, 'cv_scores': array([ 0.96666667, 0.96666667, 0.93333333, 0.96666667, 1. ]), 'feature_idx': (0, 1, 2, 3), 'feature_names': ('sepal length', 'sepal width', 'petal length', 'petal width')}}","title":"Example 1 - A simple Iris Example"},{"location":"user_guide/feature_selection/ExhaustiveFeatureSelector/#example-2-visualizing-the-feature-selection-results","text":"For our convenience, we can visualize the output from the feature selection in a pandas DataFrame format using the get_metric_dict method of the ExhaustiveFeatureSelector object. The columns std_dev and std_err represent the standard deviation and standard errors of the cross-validation scores, respectively. Below, we see the DataFrame of the Sequential Forward Selector from Example 2: import pandas as pd iris = load_iris() X = iris.data y = iris.target knn = KNeighborsClassifier(n_neighbors=3) efs1 = EFS(knn, min_features=1, max_features=4, scoring='accuracy', print_progress=True, cv=5) feature_names = ('sepal length', 'sepal width', 'petal length', 'petal width') efs1 = efs1.fit(X, y, custom_feature_names=feature_names) df = pd.DataFrame.from_dict(efs1.get_metric_dict()).T df.sort_values('avg_score', inplace=True, ascending=False) df Features: 15/15 .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } avg_score ci_bound cv_scores feature_idx feature_names std_dev std_err 12 0.973333 0.0171372 [0.966666666667, 0.966666666667, 0.96666666666... (0, 2, 3) (sepal length, petal length, petal width) 0.0133333 0.00666667 14 0.966667 0.0270963 [0.966666666667, 0.966666666667, 0.93333333333... (0, 1, 2, 3) (sepal length, sepal width, petal length, peta... 0.0210819 0.0105409 13 0.96 0.0320608 [0.966666666667, 0.966666666667, 0.93333333333... (1, 2, 3) (sepal width, petal length, petal width) 0.0249444 0.0124722 2 0.953333 0.0514116 [0.933333333333, 1.0, 0.9, 0.933333333333, 1.0] (2,) (petal length,) 0.04 0.02 6 0.953333 0.0436915 [0.966666666667, 0.966666666667, 0.9, 0.933333... (0, 3) (sepal length, petal width) 0.0339935 0.0169967 9 0.953333 0.0436915 [0.966666666667, 0.966666666667, 0.9, 0.933333... (2, 3) (petal length, petal width) 0.0339935 0.0169967 3 0.946667 0.0581151 [0.966666666667, 0.966666666667, 0.93333333333... (3,) (petal width,) 0.0452155 0.0226078 5 0.946667 0.0581151 [0.966666666667, 1.0, 0.866666666667, 0.933333... (0, 2) (sepal length, petal length) 0.0452155 0.0226078 7 0.946667 0.0436915 [0.966666666667, 1.0, 0.9, 0.933333333333, 0.9... (1, 2) (sepal width, petal length) 0.0339935 0.0169967 11 0.946667 0.0436915 [0.933333333333, 0.966666666667, 0.9, 0.933333... (0, 1, 3) (sepal length, sepal width, petal width) 0.0339935 0.0169967 8 0.94 0.0499631 [0.966666666667, 0.966666666667, 0.86666666666... (1, 3) (sepal width, petal width) 0.038873 0.0194365 10 0.94 0.0499631 [0.966666666667, 0.966666666667, 0.86666666666... (0, 1, 2) (sepal length, sepal width, petal length) 0.038873 0.0194365 4 0.726667 0.11623 [0.666666666667, 0.8, 0.633333333333, 0.866666... (0, 1) (sepal length, sepal width) 0.0904311 0.0452155 0 0.66 0.106334 [0.533333333333, 0.633333333333, 0.73333333333... (0,) (sepal length,) 0.0827312 0.0413656 1 0.566667 0.0605892 [0.533333333333, 0.633333333333, 0.6, 0.5, 0.5... (1,) (sepal width,) 0.0471405 0.0235702 import matplotlib.pyplot as plt metric_dict = efs1.get_metric_dict() fig = plt.figure() k_feat = sorted(metric_dict.keys()) avg = [metric_dict[k]['avg_score'] for k in k_feat] upper, lower = [], [] for k in k_feat: upper.append(metric_dict[k]['avg_score'] + metric_dict[k]['std_dev']) lower.append(metric_dict[k]['avg_score'] - metric_dict[k]['std_dev']) plt.fill_between(k_feat, upper, lower, alpha=0.2, color='blue', lw=1) plt.plot(k_feat, avg, color='blue', marker='o') plt.ylabel('Accuracy +/- Standard Deviation') plt.xlabel('Number of Features') feature_min = len(metric_dict[k_feat[0]]['feature_idx']) feature_max = len(metric_dict[k_feat[-1]]['feature_idx']) plt.xticks(k_feat, [str(metric_dict[k]['feature_names']) for k in k_feat], rotation=90) plt.show()","title":"Example 2 - Visualizing the feature selection results"},{"location":"user_guide/feature_selection/ExhaustiveFeatureSelector/#example-3-exhaustive-feature-selection-for-regression","text":"Similar to the classification examples above, the SequentialFeatureSelector also supports scikit-learn's estimators for regression. from sklearn.linear_model import LinearRegression from sklearn.datasets import load_boston boston = load_boston() X, y = boston.data, boston.target lr = LinearRegression() efs = EFS(lr, min_features=10, max_features=12, scoring='neg_mean_squared_error', cv=10) efs.fit(X, y) print('Best MSE score: %.2f' % efs.best_score_ * (-1)) print('Best subset:', efs.best_idx_) Features: 377/377 Best subset: (0, 1, 4, 6, 7, 8, 9, 10, 11, 12)","title":"Example 3 - Exhaustive Feature Selection for Regression"},{"location":"user_guide/feature_selection/ExhaustiveFeatureSelector/#example-4-using-the-selected-feature-subset-for-making-new-predictions","text":"# Initialize the dataset from sklearn.neighbors import KNeighborsClassifier from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split iris = load_iris() X, y = iris.data, iris.target X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.33, random_state=1) knn = KNeighborsClassifier(n_neighbors=3) # Select the \"best\" three features via # 5-fold cross-validation on the training set. from mlxtend.feature_selection import ExhaustiveFeatureSelector as EFS efs1 = EFS(knn, min_features=1, max_features=4, scoring='accuracy', cv=5) efs1 = efs1.fit(X_train, y_train) Features: 15/15 print('Selected features:', efs1.best_idx_) Selected features: (2, 3) # Generate the new subsets based on the selected features # Note that the transform call is equivalent to # X_train[:, efs1.k_feature_idx_] X_train_efs = efs1.transform(X_train) X_test_efs = efs1.transform(X_test) # Fit the estimator using the new feature subset # and make a prediction on the test data knn.fit(X_train_efs, y_train) y_pred = knn.predict(X_test_efs) # Compute the accuracy of the prediction acc = float((y_test == y_pred).sum()) / y_pred.shape[0] print('Test set accuracy: %.2f %%' % (acc*100)) Test set accuracy: 96.00 %","title":"Example 4 - Using the Selected Feature Subset For Making New Predictions"},{"location":"user_guide/feature_selection/ExhaustiveFeatureSelector/#example-5-exhaustive-feature-selection-and-gridsearch","text":"# Initialize the dataset from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split iris = load_iris() X, y = iris.data, iris.target X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.33, random_state=1) Use scikit-learn's GridSearch to tune the hyperparameters of the LogisticRegression estimator inside the ExhaustiveFeatureSelector and use it for prediction in the pipeline. Note that the clone_estimator attribute needs to be set to False . from sklearn.model_selection import GridSearchCV from sklearn.pipeline import make_pipeline from sklearn.linear_model import LogisticRegression from mlxtend.feature_selection import ExhaustiveFeatureSelector as EFS lr = LogisticRegression(multi_class='multinomial', solver='lbfgs', random_state=123) efs1 = EFS(estimator=lr, min_features=2, max_features=3, scoring='accuracy', print_progress=False, clone_estimator=False, cv=5, n_jobs=1) pipe = make_pipeline(efs1, lr) param_grid = {'exhaustivefeatureselector__estimator__C': [0.1, 1.0, 10.0]} gs = GridSearchCV(estimator=pipe, param_grid=param_grid, scoring='accuracy', n_jobs=1, cv=2, verbose=1, refit=False) # run gridearch gs = gs.fit(X_train, y_train) Fitting 2 folds for each of 3 candidates, totalling 6 fits [Parallel(n_jobs=1)]: Done 6 out of 6 | elapsed: 2.7s finished ... and the \"best\" parameters determined by GridSearch are ... print(\"Best parameters via GridSearch\", gs.best_params_) Best parameters via GridSearch {'exhaustivefeatureselector__estimator__C': 1.0}","title":"Example 5 - Exhaustive Feature Selection and GridSearch"},{"location":"user_guide/feature_selection/ExhaustiveFeatureSelector/#obtaining-the-best-k-feature-indices-after-gridsearch","text":"If we are interested in the best k best feature indices via SequentialFeatureSelection.best_idx_ , we have to initialize a GridSearchCV object with refit=True . Now, the grid search object will take the complete training dataset and the best parameters, which it found via cross-validation, to train the estimator pipeline. gs = GridSearchCV(estimator=pipe, param_grid=param_grid, scoring='accuracy', n_jobs=1, cv=2, verbose=1, refit=True) After running the grid search, we can access the individual pipeline objects of the best_estimator_ via the steps attribute. gs = gs.fit(X_train, y_train) gs.best_estimator_.steps Fitting 2 folds for each of 3 candidates, totalling 6 fits [Parallel(n_jobs=1)]: Done 6 out of 6 | elapsed: 2.9s finished [('exhaustivefeatureselector', ExhaustiveFeatureSelector(clone_estimator=False, cv=5, estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True, intercept_scaling=1, max_iter=100, multi_class='multinomial', n_jobs=1, penalty='l2', random_state=123, solver='lbfgs', tol=0.0001, verbose=0, warm_start=False), max_features=3, min_features=2, n_jobs=1, pre_dispatch='2*n_jobs', print_progress=False, scoring='accuracy')), ('logisticregression', LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True, intercept_scaling=1, max_iter=100, multi_class='multinomial', n_jobs=1, penalty='l2', random_state=123, solver='lbfgs', tol=0.0001, verbose=0, warm_start=False))] Via sub-indexing, we can then obtain the best-selected feature subset: print('Best features:', gs.best_estimator_.steps[0][1].best_idx_) Best features: (2, 3) During cross-validation, this feature combination had a CV accuracy of: print('Best score:', gs.best_score_) Best score: 0.97 gs.best_params_ {'exhaustivefeatureselector__estimator__C': 1.0} Alternatively , if we can set the \"best grid search parameters\" in our pipeline manually if we ran GridSearchCV with refit=False . It should yield the same results: pipe.set_params(**gs.best_params_).fit(X_train, y_train) print('Best features:', pipe.steps[0][1].best_idx_) Best features: (2, 3)","title":"Obtaining the best k feature indices after GridSearch"},{"location":"user_guide/feature_selection/ExhaustiveFeatureSelector/#example-6-working-with-pandas-dataframes","text":"Optionally, we can also use pandas DataFrames and pandas Series as input to the fit function. In this case, the column names of the pandas DataFrame will be used as feature names. However, note that if custom_feature_names are provided in the fit function, these custom_feature_names take precedence over the DataFrame column-based feature names. import pandas as pd from sklearn.neighbors import KNeighborsClassifier from sklearn.datasets import load_iris iris = load_iris() col_names = ('sepal length', 'sepal width', 'petal length', 'petal width') X_df = pd.DataFrame(iris.data, columns=col_names) y_series = pd.Series(iris.target) knn = KNeighborsClassifier(n_neighbors=4) from mlxtend.feature_selection import ExhaustiveFeatureSelector as EFS knn = KNeighborsClassifier(n_neighbors=3) efs1 = EFS(knn, min_features=1, max_features=4, scoring='accuracy', print_progress=True, cv=5) efs1 = efs1.fit(X_df, y_series) print('Best accuracy score: %.2f' % efs1.best_score_) print('Best subset (indices):', efs1.best_idx_) print('Best subset (corresponding names):', efs1.best_feature_names_) Features: 15/15 Best accuracy score: 0.97 Best subset (indices): (0, 2, 3) Best subset (corresponding names): ('sepal length', 'petal length', 'petal width')","title":"Example 6 - Working with pandas DataFrames"},{"location":"user_guide/feature_selection/ExhaustiveFeatureSelector/#api","text":"ExhaustiveFeatureSelector(estimator, min_features=1, max_features=1, print_progress=True, scoring='accuracy', cv=5, n_jobs=1, pre_dispatch='2 n_jobs', clone_estimator=True)* Exhaustive Feature Selection for Classification and Regression. (new in v0.4.3) Parameters estimator : scikit-learn classifier or regressor min_features : int (default: 1) Minumum number of features to select max_features : int (default: 1) Maximum number of features to select print_progress : bool (default: True) Prints progress as the number of epochs to stderr. scoring : str, (default='accuracy') Scoring metric in {accuracy, f1, precision, recall, roc_auc} for classifiers, {'mean_absolute_error', 'mean_squared_error', 'median_absolute_error', 'r2'} for regressors, or a callable object or function with signature scorer(estimator, X, y) . cv : int (default: 5) Scikit-learn cross-validation generator or int . If estimator is a classifier (or y consists of integer class labels), stratified k-fold is performed, and regular k-fold cross-validation otherwise. No cross-validation if cv is None, False, or 0. n_jobs : int (default: 1) The number of CPUs to use for evaluating different feature subsets in parallel. -1 means 'all CPUs'. pre_dispatch : int, or string (default: '2*n_jobs') Controls the number of jobs that get dispatched during parallel execution if n_jobs > 1 or n_jobs=-1 . Reducing this number can be useful to avoid an explosion of memory consumption when more jobs get dispatched than CPUs can process. This parameter can be: None, in which case all the jobs are immediately created and spawned. Use this for lightweight and fast-running jobs, to avoid delays due to on-demand spawning of the jobs An int, giving the exact number of total jobs that are spawned A string, giving an expression as a function of n_jobs, as in 2*n_jobs clone_estimator : bool (default: True) Clones estimator if True; works with the original estimator instance if False. Set to False if the estimator doesn't implement scikit-learn's set_params and get_params methods. In addition, it is required to set cv=0, and n_jobs=1. Attributes best_idx_ : array-like, shape = [n_predictions] Feature Indices of the selected feature subsets. best_feature_names_ : array-like, shape = [n_predictions] Feature names of the selected feature subsets. If pandas DataFrames are used in the fit method, the feature names correspond to the column names. Otherwise, the feature names are string representation of the feature array indices. New in v 0.13.0. best_score_ : float Cross validation average score of the selected subset. subsets_ : dict A dictionary of selected feature subsets during the exhaustive selection, where the dictionary keys are the lengths k of these feature subsets. The dictionary values are dictionaries themselves with the following keys: 'feature_idx' (tuple of indices of the feature subset) 'feature_names' (tuple of feature names of the feat. subset) 'cv_scores' (list individual cross-validation scores) 'avg_score' (average cross-validation score) Note that if pandas DataFrames are used in the fit method, the 'feature_names' correspond to the column names. Otherwise, the feature names are string representation of the feature array indices. The 'feature_names' is new in v 0.13.0. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/feature_selection/ExhaustiveFeatureSelector/","title":"API"},{"location":"user_guide/feature_selection/ExhaustiveFeatureSelector/#methods","text":"fit(X, y, custom_feature_names=None, fit_params) Perform feature selection and learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. New in v 0.13.0: pandas DataFrames are now also accepted as argument for X. y : array-like, shape = [n_samples] Target values. custom_feature_names : None or tuple (default: tuple) Custom feature names for self.k_feature_names and self.subsets_[i]['feature_names'] . (new in v 0.13.0) fit_params : dict of string -> object, optional Parameters to pass to to the fit method of classifier. Returns self : object fit_transform(X, y, fit_params) Fit to training data and return the best selected features from X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. New in v 0.13.0: pandas DataFrames are now also accepted as argument for X. y : array-like, shape = [n_samples] Target values. fit_params : dict of string -> object, optional Parameters to pass to to the fit method of classifier. Returns Feature subset of X, shape={n_samples, k_features} get_metric_dict(confidence_interval=0.95) Return metric dictionary Parameters confidence_interval : float (default: 0.95) A positive float between 0.0 and 1.0 to compute the confidence interval bounds of the CV score averages. Returns Dictionary with items where each dictionary value is a list with the number of iterations (number of feature subsets) as its length. The dictionary keys corresponding to these lists are as follows: 'feature_idx': tuple of the indices of the feature subset 'cv_scores': list with individual CV scores 'avg_score': of CV average scores 'std_dev': standard deviation of the CV score average 'std_err': standard error of the CV score average 'ci_bound': confidence interval bound of the CV score average get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self transform(X) Return the best selected features from X. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. New in v 0.13.0: pandas DataFrames are now also accepted as argument for X. Returns Feature subset of X, shape={n_samples, k_features}","title":"Methods"},{"location":"user_guide/feature_selection/SequentialFeatureSelector/","text":"Sequential Feature Selector Implementation of sequential feature algorithms (SFAs) -- greedy search algorithms -- that have been developed as a suboptimal solution to the computationally often not feasible exhaustive search. from mlxtend.feature_selection import SequentialFeatureSelector Overview Sequential feature selection algorithms are a family of greedy search algorithms that are used to reduce an initial d -dimensional feature space to a k -dimensional feature subspace where k < d . The motivation behind feature selection algorithms is to automatically select a subset of features that is most relevant to the problem. The goal of feature selection is two-fold: We want to improve the computational efficiency and reduce the generalization error of the model by removing irrelevant features or noise. A wrapper approach such as sequential feature selection is especially useful if embedded feature selection -- for example, a regularization penalty like LASSO -- is not applicable. In a nutshell, SFAs remove or add one feature at the time based on the classifier performance until a feature subset of the desired size k is reached. There are 4 different flavors of SFAs available via the SequentialFeatureSelector : Sequential Forward Selection (SFS) Sequential Backward Selection (SBS) Sequential Forward Floating Selection (SFFS) Sequential Backward Floating Selection (SBFS) The floating variants, SFFS and SBFS, can be considered as extensions to the simpler SFS and SBS algorithms. The floating algorithms have an additional exclusion or inclusion step to remove features once they were included (or excluded), so that a larger number of feature subset combinations can be sampled. It is important to emphasize that this step is conditional and only occurs if the resulting feature subset is assessed as \"better\" by the criterion function after removal (or addition) of a particular feature. Furthermore, I added an optional check to skip the conditional exclusion steps if the algorithm gets stuck in cycles. How is this different from Recursive Feature Elimination (RFE) -- e.g., as implemented in sklearn.feature_selection.RFE ? RFE is computationally less complex using the feature weight coefficients (e.g., linear models) or feature importance (tree-based algorithms) to eliminate features recursively, whereas SFSs eliminate (or add) features based on a user-defined classifier/regression performance metric. The SFAs are outlined in pseudo code below: Sequential Forward Selection (SFS) Input: Y = \\{y_1, y_2, ..., y_d\\} The SFS algorithm takes the whole d -dimensional feature set as input. Output: X_k = \\{x_j \\; | \\;j = 1, 2, ..., k; \\; x_j \\in Y\\} , where k = (0, 1, 2, ..., d) SFS returns a subset of features; the number of selected features k , where k < d , has to be specified a priori . Initialization: X_0 = \\emptyset , k = 0 We initialize the algorithm with an empty set \\emptyset (\"null set\") so that k = 0 (where k is the size of the subset). Step 1 (Inclusion): x^+ = \\text{ arg max } J(x_k + x), \\text{ where } x \\in Y - X_k X_{k+1} = X_k + x^+ k = k + 1 Go to Step 1 in this step, we add an additional feature, x^+ , to our feature subset X_k . x^+ is the feature that maximizes our criterion function, that is, the feature that is associated with the best classifier performance if it is added to X_k . We repeat this procedure until the termination criterion is satisfied. Termination: k = p We add features from the feature subset X_k until the feature subset of size k contains the number of desired features p that we specified a priori . Sequential Backward Selection (SBS) Input: the set of all features, Y = \\{y_1, y_2, ..., y_d\\} The SBS algorithm takes the whole feature set as input. Output: X_k = \\{x_j \\; | \\;j = 1, 2, ..., k; \\; x_j \\in Y\\} , where k = (0, 1, 2, ..., d) SBS returns a subset of features; the number of selected features k , where k < d , has to be specified a priori . Initialization: X_0 = Y , k = d We initialize the algorithm with the given feature set so that the k = d . Step 1 (Exclusion): x^- = \\text{ arg max } J(x_k - x), \\text{ where } x \\in X_k X_{k-1} = X_k - x^- k = k - 1 Go to Step 1 In this step, we remove a feature, x^- from our feature subset X_k . x^- is the feature that maximizes our criterion function upon re,oval, that is, the feature that is associated with the best classifier performance if it is removed from X_k . We repeat this procedure until the termination criterion is satisfied. Termination: k = p We add features from the feature subset X_k until the feature subset of size k contains the number of desired features p that we specified a priori . Sequential Backward Floating Selection (SBFS) Input: the set of all features, Y = \\{y_1, y_2, ..., y_d\\} The SBFS algorithm takes the whole feature set as input. Output: X_k = \\{x_j \\; | \\;j = 1, 2, ..., k; \\; x_j \\in Y\\} , where k = (0, 1, 2, ..., d) SBFS returns a subset of features; the number of selected features k , where k < d , has to be specified a priori . Initialization: X_0 = Y , k = d We initialize the algorithm with the given feature set so that the k = d . Step 1 (Exclusion): x^- = \\text{ arg max } J(x_k - x), \\text{ where } x \\in X_k X_{k-1} = X_k - x^- k = k - 1 Go to Step 2 In this step, we remove a feature, x^- from our feature subset X_k . x^- is the feature that maximizes our criterion function upon re,oval, that is, the feature that is associated with the best classifier performance if it is removed from X_k . Step 2 (Conditional Inclusion): x^+ = \\text{ arg max } J(x_k + x), \\text{ where } x \\in Y - X_k if J(x_k + x) > J(x_k + x) : X_{k+1} = X_k + x^+ k = k + 1 Go to Step 1 In Step 2, we search for features that improve the classifier performance if they are added back to the feature subset. If such features exist, we add the feature x^+ for which the performance improvement is maximized. If k = 2 or an improvement cannot be made (i.e., such feature x^+ cannot be found), go back to step 1; else, repeat this step. Termination: k = p We add features from the feature subset X_k until the feature subset of size k contains the number of desired features p that we specified a priori . Sequential Forward Floating Selection (SFFS) Input: the set of all features, Y = \\{y_1, y_2, ..., y_d\\} The SFFS algorithm takes the whole feature set as input, if our feature space consists of, e.g. 10, if our feature space consists of 10 dimensions ( d = 10 ). Output: a subset of features, X_k = \\{x_j \\; | \\;j = 1, 2, ..., k; \\; x_j \\in Y\\} , where k = (0, 1, 2, ..., d) The returned output of the algorithm is a subset of the feature space of a specified size. E.g., a subset of 5 features from a 10-dimensional feature space ( k = 5, d = 10 ). Initialization: X_0 = Y , k = d We initialize the algorithm with an empty set (\"null set\") so that the k = 0 (where k is the size of the subset) Step 1 (Inclusion): x^+ = \\text{ arg max } J(x_k + x), \\text{ where } x \\in Y - X_k X_{k+1} = X_k + x^+ k = k + 1 Go to Step 2 Step 2 (Conditional Exclusion): x^- = \\text{ arg max } J(x_k - x), \\text{ where } x \\in X_k if \\; J(x_k - x) > J(x_k - x) : X_{k-1} = X_k - x^- k = k - 1 Go to Step 1 In step 1, we include the feature from the feature space that leads to the best performance increase for our feature subset (assessed by the criterion function ). Then, we go over to step 2 In step 2, we only remove a feature if the resulting subset would gain an increase in performance. If k = 2 or an improvement cannot be made (i.e., such feature x^+ cannot be found), go back to step 1; else, repeat this step. Steps 1 and 2 are repeated until the Termination criterion is reached. Termination: stop when k equals the number of desired features References Ferri, F. J., Pudil P., Hatef, M., Kittler, J. (1994). \"Comparative study of techniques for large-scale feature selection.\" Pattern Recognition in Practice IV : 403-413. Pudil, P., Novovi\u010dov\u00e1, J., & Kittler, J. (1994). \"Floating search methods in feature selection.\" Pattern recognition letters 15.11 (1994): 1119-1125. Example 1 - A simple Sequential Forward Selection example Initializing a simple classifier from scikit-learn: from sklearn.neighbors import KNeighborsClassifier from sklearn.datasets import load_iris iris = load_iris() X = iris.data y = iris.target knn = KNeighborsClassifier(n_neighbors=4) We start by selection the \"best\" 3 features from the Iris dataset via Sequential Forward Selection (SFS). Here, we set forward=True and floating=False . By choosing cv=0 , we don't perform any cross-validation, therefore, the performance (here: 'accuracy' ) is computed entirely on the training set. from mlxtend.feature_selection import SequentialFeatureSelector as SFS sfs1 = SFS(knn, k_features=3, forward=True, floating=False, verbose=2, scoring='accuracy', cv=0) sfs1 = sfs1.fit(X, y) [Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s [Parallel(n_jobs=1)]: Done 4 out of 4 | elapsed: 0.0s finished [2018-05-06 12:49:16] Features: 1/3 -- score: 0.96[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s [Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 0.0s finished [2018-05-06 12:49:16] Features: 2/3 -- score: 0.973333333333[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s [Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 0.0s finished [2018-05-06 12:49:16] Features: 3/3 -- score: 0.973333333333 Via the subsets_ attribute, we can take a look at the selected feature indices at each step: sfs1.subsets_ {1: {'avg_score': 0.95999999999999996, 'cv_scores': array([ 0.96]), 'feature_idx': (3,), 'feature_names': ('3',)}, 2: {'avg_score': 0.97333333333333338, 'cv_scores': array([ 0.97333333]), 'feature_idx': (2, 3), 'feature_names': ('2', '3')}, 3: {'avg_score': 0.97333333333333338, 'cv_scores': array([ 0.97333333]), 'feature_idx': (1, 2, 3), 'feature_names': ('1', '2', '3')}} Note that the 'feature_names' entry is simply a string representation of the 'feature_idx' in this case. Optionally, we can provide custom feature names via the fit method's custom_feature_names parameter: feature_names = ('sepal length', 'sepal width', 'petal length', 'petal width') sfs1 = sfs1.fit(X, y, custom_feature_names=feature_names) sfs1.subsets_ [Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s [Parallel(n_jobs=1)]: Done 4 out of 4 | elapsed: 0.0s finished [2018-05-06 12:49:16] Features: 1/3 -- score: 0.96[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s [Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 0.0s finished [2018-05-06 12:49:16] Features: 2/3 -- score: 0.973333333333[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s [Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 0.0s finished [2018-05-06 12:49:16] Features: 3/3 -- score: 0.973333333333 {1: {'avg_score': 0.95999999999999996, 'cv_scores': array([ 0.96]), 'feature_idx': (3,), 'feature_names': ('petal width',)}, 2: {'avg_score': 0.97333333333333338, 'cv_scores': array([ 0.97333333]), 'feature_idx': (2, 3), 'feature_names': ('petal length', 'petal width')}, 3: {'avg_score': 0.97333333333333338, 'cv_scores': array([ 0.97333333]), 'feature_idx': (1, 2, 3), 'feature_names': ('sepal width', 'petal length', 'petal width')}} Furthermore, we can access the indices of the 3 best features directly via the k_feature_idx_ attribute: sfs1.k_feature_idx_ (1, 2, 3) And similarly, to obtain the names of these features, given that we provided an argument to the custom_feature_names parameter, we can refer to the sfs1.k_feature_names_ attribute: sfs1.k_feature_names_ ('sepal width', 'petal length', 'petal width') Finally, the prediction score for these 3 features can be accesses via k_score_ : sfs1.k_score_ 0.97333333333333338 Example 2 - Toggling between SFS, SBS, SFFS, and SBFS Using the forward and floating parameters, we can toggle between SFS, SBS, SFFS, and SBFS as shown below. Note that we are performing (stratified) 4-fold cross-validation for more robust estimates in contrast to Example 1. Via n_jobs=-1 , we choose to run the cross-validation on all our available CPU cores. # Sequential Forward Selection sfs = SFS(knn, k_features=3, forward=True, floating=False, scoring='accuracy', cv=4, n_jobs=-1) sfs = sfs.fit(X, y) print('\\nSequential Forward Selection (k=3):') print(sfs.k_feature_idx_) print('CV Score:') print(sfs.k_score_) ################################################### # Sequential Backward Selection sbs = SFS(knn, k_features=3, forward=False, floating=False, scoring='accuracy', cv=4, n_jobs=-1) sbs = sbs.fit(X, y) print('\\nSequential Backward Selection (k=3):') print(sbs.k_feature_idx_) print('CV Score:') print(sbs.k_score_) ################################################### # Sequential Forward Floating Selection sffs = SFS(knn, k_features=3, forward=True, floating=True, scoring='accuracy', cv=4, n_jobs=-1) sffs = sffs.fit(X, y) print('\\nSequential Forward Floating Selection (k=3):') print(sffs.k_feature_idx_) print('CV Score:') print(sffs.k_score_) ################################################### # Sequential Backward Floating Selection sbfs = SFS(knn, k_features=3, forward=False, floating=True, scoring='accuracy', cv=4, n_jobs=-1) sbfs = sbfs.fit(X, y) print('\\nSequential Backward Floating Selection (k=3):') print(sbfs.k_feature_idx_) print('CV Score:') print(sbfs.k_score_) Sequential Forward Selection (k=3): (1, 2, 3) CV Score: 0.972756410256 Sequential Backward Selection (k=3): (1, 2, 3) CV Score: 0.972756410256 Sequential Forward Floating Selection (k=3): (1, 2, 3) CV Score: 0.972756410256 Sequential Backward Floating Selection (k=3): (1, 2, 3) CV Score: 0.972756410256 In this simple scenario, selecting the best 3 features out of the 4 available features in the Iris set, we end up with similar results regardless of which sequential selection algorithms we used. Example 3 - Visualizing the results in DataFrames For our convenience, we can visualize the output from the feature selection in a pandas DataFrame format using the get_metric_dict method of the SequentialFeatureSelector object. The columns std_dev and std_err represent the standard deviation and standard errors of the cross-validation scores, respectively. Below, we see the DataFrame of the Sequential Forward Selector from Example 2: import pandas as pd pd.DataFrame.from_dict(sfs.get_metric_dict()).T .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } avg_score ci_bound cv_scores feature_idx feature_names std_dev std_err 1 0.952991 0.0660624 [0.974358974359, 0.948717948718, 0.88888888888... (3,) (3,) 0.0412122 0.0237939 2 0.959936 0.0494801 [0.974358974359, 0.948717948718, 0.91666666666... (2, 3) (2, 3) 0.0308676 0.0178214 3 0.972756 0.0315204 [0.974358974359, 1.0, 0.944444444444, 0.972222... (1, 2, 3) (1, 2, 3) 0.0196636 0.0113528 Now, let's compare it to the Sequential Backward Selector: pd.DataFrame.from_dict(sbs.get_metric_dict()).T .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } avg_score ci_bound cv_scores feature_idx feature_names std_dev std_err 3 0.972756 0.0315204 [0.974358974359, 1.0, 0.944444444444, 0.972222... (1, 2, 3) (1, 2, 3) 0.0196636 0.0113528 4 0.952991 0.0372857 [0.974358974359, 0.948717948718, 0.91666666666... (0, 1, 2, 3) (0, 1, 2, 3) 0.0232602 0.0134293 We can see that both SFS and SBFS found the same \"best\" 3 features, however, the intermediate steps where obviously different. The ci_bound column in the DataFrames above represents the confidence interval around the computed cross-validation scores. By default, a confidence interval of 95% is used, but we can use different confidence bounds via the confidence_interval parameter. E.g., the confidence bounds for a 90% confidence interval can be obtained as follows: pd.DataFrame.from_dict(sbs.get_metric_dict(confidence_interval=0.90)).T .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } avg_score ci_bound cv_scores feature_idx feature_names std_dev std_err 3 0.972756 0.0242024 [0.974358974359, 1.0, 0.944444444444, 0.972222... (1, 2, 3) (1, 2, 3) 0.0196636 0.0113528 4 0.952991 0.0286292 [0.974358974359, 0.948717948718, 0.91666666666... (0, 1, 2, 3) (0, 1, 2, 3) 0.0232602 0.0134293 Example 4 - Plotting the results After importing the little helper function plotting.plot_sequential_feature_selection , we can also visualize the results using matplotlib figures. from mlxtend.plotting import plot_sequential_feature_selection as plot_sfs import matplotlib.pyplot as plt sfs = SFS(knn, k_features=4, forward=True, floating=False, scoring='accuracy', verbose=2, cv=5) sfs = sfs.fit(X, y) fig1 = plot_sfs(sfs.get_metric_dict(), kind='std_dev') plt.ylim([0.8, 1]) plt.title('Sequential Forward Selection (w. StdDev)') plt.grid() plt.show() [Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s [Parallel(n_jobs=1)]: Done 4 out of 4 | elapsed: 0.0s finished [2018-05-06 12:49:18] Features: 1/4 -- score: 0.96[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s [Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 0.0s finished [2018-05-06 12:49:18] Features: 2/4 -- score: 0.966666666667[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s [Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 0.0s finished [2018-05-06 12:49:18] Features: 3/4 -- score: 0.953333333333[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s [Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s finished [2018-05-06 12:49:18] Features: 4/4 -- score: 0.973333333333 Example 5 - Sequential Feature Selection for Regression Similar to the classification examples above, the SequentialFeatureSelector also supports scikit-learn's estimators for regression. from sklearn.linear_model import LinearRegression from sklearn.datasets import load_boston boston = load_boston() X, y = boston.data, boston.target lr = LinearRegression() sfs = SFS(lr, k_features=13, forward=True, floating=False, scoring='neg_mean_squared_error', cv=10) sfs = sfs.fit(X, y) fig = plot_sfs(sfs.get_metric_dict(), kind='std_err') plt.title('Sequential Forward Selection (w. StdErr)') plt.grid() plt.show() Example 6 -- Feature Selection with Fixed Train/Validation Splits If you do not wish to use cross-validation (here: k-fold cross-validation, i.e., rotating training and validation folds), you can use the PredefinedHoldoutSplit class to specify your own, fixed training and validation split. from sklearn.datasets import load_iris from mlxtend.evaluate import PredefinedHoldoutSplit import numpy as np iris = load_iris() X = iris.data y = iris.target rng = np.random.RandomState(123) my_validation_indices = rng.permutation(np.arange(150))[:30] print(my_validation_indices) [ 72 112 132 88 37 138 87 42 8 90 141 33 59 116 135 104 36 13 63 45 28 133 24 127 46 20 31 121 117 4] from sklearn.neighbors import KNeighborsClassifier from mlxtend.feature_selection import SequentialFeatureSelector as SFS knn = KNeighborsClassifier(n_neighbors=4) piter = PredefinedHoldoutSplit(my_validation_indices) sfs1 = SFS(knn, k_features=3, forward=True, floating=False, verbose=2, scoring='accuracy', cv=piter) sfs1 = sfs1.fit(X, y) [Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s [Parallel(n_jobs=1)]: Done 4 out of 4 | elapsed: 0.0s finished [2018-09-24 02:31:21] Features: 1/3 -- score: 0.9666666666666667[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s [Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 0.0s finished [2018-09-24 02:31:21] Features: 2/3 -- score: 0.9666666666666667[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s [Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 0.0s finished [2018-09-24 02:31:21] Features: 3/3 -- score: 0.9666666666666667 Example 7 -- Using the Selected Feature Subset For Making New Predictions # Initialize the dataset from sklearn.neighbors import KNeighborsClassifier from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split iris = load_iris() X, y = iris.data, iris.target X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.33, random_state=1) knn = KNeighborsClassifier(n_neighbors=4) # Select the \"best\" three features via # 5-fold cross-validation on the training set. from mlxtend.feature_selection import SequentialFeatureSelector as SFS sfs1 = SFS(knn, k_features=3, forward=True, floating=False, scoring='accuracy', cv=5) sfs1 = sfs1.fit(X_train, y_train) print('Selected features:', sfs1.k_feature_idx_) Selected features: (1, 2, 3) # Generate the new subsets based on the selected features # Note that the transform call is equivalent to # X_train[:, sfs1.k_feature_idx_] X_train_sfs = sfs1.transform(X_train) X_test_sfs = sfs1.transform(X_test) # Fit the estimator using the new feature subset # and make a prediction on the test data knn.fit(X_train_sfs, y_train) y_pred = knn.predict(X_test_sfs) # Compute the accuracy of the prediction acc = float((y_test == y_pred).sum()) / y_pred.shape[0] print('Test set accuracy: %.2f %%' % (acc * 100)) Test set accuracy: 96.00 % Example 8 -- Sequential Feature Selection and GridSearch # Initialize the dataset from sklearn.neighbors import KNeighborsClassifier from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split iris = load_iris() X, y = iris.data, iris.target X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.33, random_state=1) Use scikit-learn's GridSearch to tune the hyperparameters inside and outside the SequentialFeatureSelector : from sklearn.model_selection import GridSearchCV from sklearn.pipeline import Pipeline from mlxtend.feature_selection import SequentialFeatureSelector as SFS import mlxtend knn = KNeighborsClassifier(n_neighbors=2) sfs1 = SFS(estimator=knn, k_features=3, forward=True, floating=False, scoring='accuracy', cv=5) pipe = Pipeline([('sfs', sfs1), ('knn', knn)]) param_grid = [ {'sfs__k_features': [1, 2, 3, 4], 'sfs__estimator__n_neighbors': [1, 2, 3, 4]} ] gs = GridSearchCV(estimator=pipe, param_grid=param_grid, scoring='accuracy', n_jobs=1, cv=5, refit=False) # run gridearch gs = gs.fit(X_train, y_train) ... and the \"best\" parameters determined by GridSearch are ... print(\"Best parameters via GridSearch\", gs.best_params_) Best parameters via GridSearch {'sfs__estimator__n_neighbors': 1, 'sfs__k_features': 3} Obtaining the best k feature indices after GridSearch If we are interested in the best k feature indices via SequentialFeatureSelection.k_feature_idx_ , we have to initialize a GridSearchCV object with refit=True . Now, the grid search object will take the complete training dataset and the best parameters, which it found via cross-validation, to train the estimator pipeline. gs = GridSearchCV(estimator=pipe, param_grid=param_grid, scoring='accuracy', n_jobs=1, cv=5, refit=True) gs = gs.fit(X_train, y_train) After running the grid search, we can access the individual pipeline objects of the best_estimator_ via the steps attribute. gs.best_estimator_.steps [('sfs', SequentialFeatureSelector(clone_estimator=True, cv=5, estimator=KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski', metric_params=None, n_jobs=1, n_neighbors=1, p=2, weights='uniform'), floating=False, forward=True, k_features=3, n_jobs=1, pre_dispatch='2*n_jobs', scoring='accuracy', verbose=0)), ('knn', KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski', metric_params=None, n_jobs=1, n_neighbors=2, p=2, weights='uniform'))] Via sub-indexing, we can then obtain the best-selected feature subset: print('Best features:', gs.best_estimator_.steps[0][1].k_feature_idx_) Best features: (0, 1, 3) During cross-validation, this feature combination had a CV accuracy of: print('Best score:', gs.best_score_) Best score: 0.94 gs.best_params_ {'sfs__estimator__n_neighbors': 1, 'sfs__k_features': 3} Alternatively , if we can set the \"best grid search parameters\" in our pipeline manually if we ran GridSearchCV with refit=False . It should yield the same results: pipe.set_params(**gs.best_params_).fit(X_train, y_train) print('Best features:', pipe.steps[0][1].k_feature_idx_) Best features: (0, 1, 3) Example 9 -- Selecting the \"best\" feature combination in a k-range If k_features is set to to a tuple (min_k, max_k) (new in 0.4.2), the SFS will now select the best feature combination that it discovered by iterating from k=1 to max_k (forward), or max_k to min_k (backward). The size of the returned feature subset is then within max_k to min_k , depending on which combination scored best during cross validation. X.shape (150, 4) from mlxtend.feature_selection import SequentialFeatureSelector as SFS from sklearn.neighbors import KNeighborsClassifier from mlxtend.data import wine_data from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.pipeline import make_pipeline X, y = wine_data() X_train, X_test, y_train, y_test= train_test_split(X, y, stratify=y, test_size=0.3, random_state=1) knn = KNeighborsClassifier(n_neighbors=2) sfs1 = SFS(estimator=knn, k_features=(3, 10), forward=True, floating=False, scoring='accuracy', cv=5) pipe = make_pipeline(StandardScaler(), sfs1) pipe.fit(X_train, y_train) print('best combination (ACC: %.3f): %s\\n' % (sfs1.k_score_, sfs1.k_feature_idx_)) print('all subsets:\\n', sfs1.subsets_) plot_sfs(sfs1.get_metric_dict(), kind='std_err'); best combination (ACC: 0.992): (0, 1, 2, 3, 6, 8, 9, 10, 11, 12) all subsets: {1: {'feature_idx': (6,), 'cv_scores': array([ 0.84615385, 0.6 , 0.88 , 0.79166667, 0.875 ]), 'avg_score': 0.7985641025641026, 'feature_names': ('6',)}, 2: {'feature_idx': (6, 9), 'cv_scores': array([ 0.92307692, 0.88 , 1. , 0.95833333, 0.91666667]), 'avg_score': 0.93561538461538463, 'feature_names': ('6', '9')}, 3: {'feature_idx': (6, 9, 12), 'cv_scores': array([ 0.92307692, 0.92 , 0.96 , 1. , 0.95833333]), 'avg_score': 0.95228205128205123, 'feature_names': ('6', '9', '12')}, 4: {'feature_idx': (3, 6, 9, 12), 'cv_scores': array([ 0.96153846, 0.96 , 0.96 , 1. , 0.95833333]), 'avg_score': 0.96797435897435891, 'feature_names': ('3', '6', '9', '12')}, 5: {'feature_idx': (3, 6, 9, 10, 12), 'cv_scores': array([ 0.92307692, 0.96 , 1. , 1. , 1. ]), 'avg_score': 0.97661538461538466, 'feature_names': ('3', '6', '9', '10', '12')}, 6: {'feature_idx': (2, 3, 6, 9, 10, 12), 'cv_scores': array([ 0.92307692, 0.96 , 1. , 0.95833333, 1. ]), 'avg_score': 0.96828205128205125, 'feature_names': ('2', '3', '6', '9', '10', '12')}, 7: {'feature_idx': (0, 2, 3, 6, 9, 10, 12), 'cv_scores': array([ 0.92307692, 0.92 , 1. , 1. , 1. ]), 'avg_score': 0.96861538461538466, 'feature_names': ('0', '2', '3', '6', '9', '10', '12')}, 8: {'feature_idx': (0, 2, 3, 6, 8, 9, 10, 12), 'cv_scores': array([ 1. , 0.92, 1. , 1. , 1. ]), 'avg_score': 0.98399999999999999, 'feature_names': ('0', '2', '3', '6', '8', '9', '10', '12')}, 9: {'feature_idx': (0, 2, 3, 6, 8, 9, 10, 11, 12), 'cv_scores': array([ 1. , 0.92, 1. , 1. , 1. ]), 'avg_score': 0.98399999999999999, 'feature_names': ('0', '2', '3', '6', '8', '9', '10', '11', '12')}, 10: {'feature_idx': (0, 1, 2, 3, 6, 8, 9, 10, 11, 12), 'cv_scores': array([ 1. , 0.96, 1. , 1. , 1. ]), 'avg_score': 0.99199999999999999, 'feature_names': ('0', '1', '2', '3', '6', '8', '9', '10', '11', '12')}} Example 10 -- Using other cross-validation schemes In addition to standard k-fold and stratified k-fold, other cross validation schemes can be used with SequentialFeatureSelector . For example, GroupKFold or LeaveOneOut cross-validation from scikit-learn. Using GroupKFold with SequentialFeatureSelector from mlxtend.feature_selection import SequentialFeatureSelector as SFS from sklearn.neighbors import KNeighborsClassifier from mlxtend.data import iris_data from sklearn.model_selection import GroupKFold import numpy as np X, y = iris_data() groups = np.arange(len(y)) // 10 print('groups: {}'.format(groups)) groups: [ 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9 9 9 10 10 10 10 10 10 10 10 10 10 11 11 11 11 11 11 11 11 11 11 12 12 12 12 12 12 12 12 12 12 13 13 13 13 13 13 13 13 13 13 14 14 14 14 14 14 14 14 14 14] Calling the split() method of a scikit-learn cross-validator object will return a generator that yields train, test splits. cv_gen = GroupKFold(4).split(X, y, groups) cv_gen The cv parameter of SequentialFeatureSelector must be either an int or an iterable yielding train, test splits. This iterable can be constructed by passing the train, test split generator to the built-in list() function. cv = list(cv_gen) knn = KNeighborsClassifier(n_neighbors=2) sfs = SFS(estimator=knn, k_features=2, scoring='accuracy', cv=cv) sfs.fit(X, y) print('best combination (ACC: %.3f): %s\\n' % (sfs.k_score_, sfs.k_feature_idx_)) best combination (ACC: 0.940): (2, 3) Example 11 - Working with pandas DataFrames Example 12 - Using Pandas DataFrames Optionally, we can also use pandas DataFrames and pandas Series as input to the fit function. In this case, the column names of the pandas DataFrame will be used as feature names. However, note that if custom_feature_names are provided in the fit function, these custom_feature_names take precedence over the DataFrame column-based feature names. import pandas as pd from sklearn.neighbors import KNeighborsClassifier from sklearn.datasets import load_iris from mlxtend.feature_selection import SequentialFeatureSelector as SFS iris = load_iris() X = iris.data y = iris.target knn = KNeighborsClassifier(n_neighbors=4) sfs1 = SFS(knn, k_features=3, forward=True, floating=False, scoring='accuracy', cv=0) X_df = pd.DataFrame(X, columns=['sepal len', 'petal len', 'sepal width', 'petal width']) X_df.head() .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } sepal len petal len sepal width petal width 0 5.1 3.5 1.4 0.2 1 4.9 3.0 1.4 0.2 2 4.7 3.2 1.3 0.2 3 4.6 3.1 1.5 0.2 4 5.0 3.6 1.4 0.2 Also, the target array, y , can be optionally be cast as a Series: y_series = pd.Series(y) y_series.head() 0 0 1 0 2 0 3 0 4 0 dtype: int64 sfs1 = sfs1.fit(X_df, y_series) Note that the only difference of passing a pandas DataFrame as input is that the sfs1.subsets_ array will now contain a new column, sfs1.subsets_ {1: {'avg_score': 0.95999999999999996, 'cv_scores': array([ 0.96]), 'feature_idx': (3,), 'feature_names': ('petal width',)}, 2: {'avg_score': 0.97333333333333338, 'cv_scores': array([ 0.97333333]), 'feature_idx': (2, 3), 'feature_names': ('sepal width', 'petal width')}, 3: {'avg_score': 0.97333333333333338, 'cv_scores': array([ 0.97333333]), 'feature_idx': (1, 2, 3), 'feature_names': ('petal len', 'sepal width', 'petal width')}} In mlxtend version >= 0.13 pandas DataFrames are supported as feature inputs to the SequentianFeatureSelector instead of NumPy arrays or other NumPy-like array types. API SequentialFeatureSelector(estimator, k_features=1, forward=True, floating=False, verbose=0, scoring=None, cv=5, n_jobs=1, pre_dispatch='2 n_jobs', clone_estimator=True)* Sequential Feature Selection for Classification and Regression. Parameters estimator : scikit-learn classifier or regressor k_features : int or tuple or str (default: 1) Number of features to select, where k_features < the full feature set. New in 0.4.2: A tuple containing a min and max value can be provided, and the SFS will consider return any feature combination between min and max that scored highest in cross-validtion. For example, the tuple (1, 4) will return any combination from 1 up to 4 features instead of a fixed number of features k. New in 0.8.0: A string argument \"best\" or \"parsimonious\". If \"best\" is provided, the feature selector will return the feature subset with the best cross-validation performance. If \"parsimonious\" is provided as an argument, the smallest feature subset that is within one standard error of the cross-validation performance will be selected. forward : bool (default: True) Forward selection if True, backward selection otherwise floating : bool (default: False) Adds a conditional exclusion/inclusion if True. verbose : int (default: 0), level of verbosity to use in logging. If 0, no output, if 1 number of features in current set, if 2 detailed logging i ncluding timestamp and cv scores at step. scoring : str, callable, or None (default: None) If None (default), uses 'accuracy' for sklearn classifiers and 'r2' for sklearn regressors. If str, uses a sklearn scoring metric string identifier, for example {accuracy, f1, precision, recall, roc_auc} for classifiers, {'mean_absolute_error', 'mean_squared_error'/'neg_mean_squared_error', 'median_absolute_error', 'r2'} for regressors. If a callable object or function is provided, it has to be conform with sklearn's signature scorer(estimator, X, y) ; see http://scikit-learn.org/stable/modules/generated/sklearn.metrics.make_scorer.html for more information. cv : int (default: 5) Integer or iterable yielding train, test splits. If cv is an integer and estimator is a classifier (or y consists of integer class labels) stratified k-fold. Otherwise regular k-fold cross-validation is performed. No cross-validation if cv is None, False, or 0. n_jobs : int (default: 1) The number of CPUs to use for evaluating different feature subsets in parallel. -1 means 'all CPUs'. pre_dispatch : int, or string (default: '2*n_jobs') Controls the number of jobs that get dispatched during parallel execution if n_jobs > 1 or n_jobs=-1 . Reducing this number can be useful to avoid an explosion of memory consumption when more jobs get dispatched than CPUs can process. This parameter can be: None, in which case all the jobs are immediately created and spawned. Use this for lightweight and fast-running jobs, to avoid delays due to on-demand spawning of the jobs An int, giving the exact number of total jobs that are spawned A string, giving an expression as a function of n_jobs, as in 2*n_jobs clone_estimator : bool (default: True) Clones estimator if True; works with the original estimator instance if False. Set to False if the estimator doesn't implement scikit-learn's set_params and get_params methods. In addition, it is required to set cv=0, and n_jobs=1. Attributes k_feature_idx_ : array-like, shape = [n_predictions] Feature Indices of the selected feature subsets. k_feature_names_ : array-like, shape = [n_predictions] Feature names of the selected feature subsets. If pandas DataFrames are used in the fit method, the feature names correspond to the column names. Otherwise, the feature names are string representation of the feature array indices. New in v 0.13.0. k_score_ : float Cross validation average score of the selected subset. subsets_ : dict A dictionary of selected feature subsets during the sequential selection, where the dictionary keys are the lengths k of these feature subsets. The dictionary values are dictionaries themselves with the following keys: 'feature_idx' (tuple of indices of the feature subset) 'feature_names' (tuple of feature names of the feat. subset) 'cv_scores' (list individual cross-validation scores) 'avg_score' (average cross-validation score) Note that if pandas DataFrames are used in the fit method, the 'feature_names' correspond to the column names. Otherwise, the feature names are string representation of the feature array indices. The 'feature_names' is new in v 0.13.0. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/feature_selection/SequentialFeatureSelector/ Methods fit(X, y, custom_feature_names=None, fit_params) Perform feature selection and learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. New in v 0.13.0: pandas DataFrames are now also accepted as argument for X. y : array-like, shape = [n_samples] Target values. New in v 0.13.0: pandas DataFrames are now also accepted as argument for y. custom_feature_names : None or tuple (default: tuple) Custom feature names for self.k_feature_names and self.subsets_[i]['feature_names'] . (new in v 0.13.0) fit_params : dict of string -> object, optional Parameters to pass to to the fit method of classifier. Returns self : object fit_transform(X, y, fit_params) Fit to training data then reduce X to its most important features. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. New in v 0.13.0: pandas DataFrames are now also accepted as argument for X. y : array-like, shape = [n_samples] Target values. New in v 0.13.0: a pandas Series are now also accepted as argument for y. fit_params : dict of string -> object, optional Parameters to pass to to the fit method of classifier. Returns Reduced feature subset of X, shape={n_samples, k_features} get_metric_dict(confidence_interval=0.95) Return metric dictionary Parameters confidence_interval : float (default: 0.95) A positive float between 0.0 and 1.0 to compute the confidence interval bounds of the CV score averages. Returns Dictionary with items where each dictionary value is a list with the number of iterations (number of feature subsets) as its length. The dictionary keys corresponding to these lists are as follows: 'feature_idx': tuple of the indices of the feature subset 'cv_scores': list with individual CV scores 'avg_score': of CV average scores 'std_dev': standard deviation of the CV score average 'std_err': standard error of the CV score average 'ci_bound': confidence interval bound of the CV score average get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self transform(X) Reduce X to its most important features. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. New in v 0.13.0: pandas DataFrames are now also accepted as argument for X. Returns Reduced feature subset of X, shape={n_samples, k_features}","title":"Sequential Feature Selector"},{"location":"user_guide/feature_selection/SequentialFeatureSelector/#sequential-feature-selector","text":"Implementation of sequential feature algorithms (SFAs) -- greedy search algorithms -- that have been developed as a suboptimal solution to the computationally often not feasible exhaustive search. from mlxtend.feature_selection import SequentialFeatureSelector","title":"Sequential Feature Selector"},{"location":"user_guide/feature_selection/SequentialFeatureSelector/#overview","text":"Sequential feature selection algorithms are a family of greedy search algorithms that are used to reduce an initial d -dimensional feature space to a k -dimensional feature subspace where k < d . The motivation behind feature selection algorithms is to automatically select a subset of features that is most relevant to the problem. The goal of feature selection is two-fold: We want to improve the computational efficiency and reduce the generalization error of the model by removing irrelevant features or noise. A wrapper approach such as sequential feature selection is especially useful if embedded feature selection -- for example, a regularization penalty like LASSO -- is not applicable. In a nutshell, SFAs remove or add one feature at the time based on the classifier performance until a feature subset of the desired size k is reached. There are 4 different flavors of SFAs available via the SequentialFeatureSelector : Sequential Forward Selection (SFS) Sequential Backward Selection (SBS) Sequential Forward Floating Selection (SFFS) Sequential Backward Floating Selection (SBFS) The floating variants, SFFS and SBFS, can be considered as extensions to the simpler SFS and SBS algorithms. The floating algorithms have an additional exclusion or inclusion step to remove features once they were included (or excluded), so that a larger number of feature subset combinations can be sampled. It is important to emphasize that this step is conditional and only occurs if the resulting feature subset is assessed as \"better\" by the criterion function after removal (or addition) of a particular feature. Furthermore, I added an optional check to skip the conditional exclusion steps if the algorithm gets stuck in cycles. How is this different from Recursive Feature Elimination (RFE) -- e.g., as implemented in sklearn.feature_selection.RFE ? RFE is computationally less complex using the feature weight coefficients (e.g., linear models) or feature importance (tree-based algorithms) to eliminate features recursively, whereas SFSs eliminate (or add) features based on a user-defined classifier/regression performance metric. The SFAs are outlined in pseudo code below:","title":"Overview"},{"location":"user_guide/feature_selection/SequentialFeatureSelector/#sequential-forward-selection-sfs","text":"Input: Y = \\{y_1, y_2, ..., y_d\\} The SFS algorithm takes the whole d -dimensional feature set as input. Output: X_k = \\{x_j \\; | \\;j = 1, 2, ..., k; \\; x_j \\in Y\\} , where k = (0, 1, 2, ..., d) SFS returns a subset of features; the number of selected features k , where k < d , has to be specified a priori . Initialization: X_0 = \\emptyset , k = 0 We initialize the algorithm with an empty set \\emptyset (\"null set\") so that k = 0 (where k is the size of the subset). Step 1 (Inclusion): x^+ = \\text{ arg max } J(x_k + x), \\text{ where } x \\in Y - X_k X_{k+1} = X_k + x^+ k = k + 1 Go to Step 1 in this step, we add an additional feature, x^+ , to our feature subset X_k . x^+ is the feature that maximizes our criterion function, that is, the feature that is associated with the best classifier performance if it is added to X_k . We repeat this procedure until the termination criterion is satisfied. Termination: k = p We add features from the feature subset X_k until the feature subset of size k contains the number of desired features p that we specified a priori .","title":"Sequential Forward Selection (SFS)"},{"location":"user_guide/feature_selection/SequentialFeatureSelector/#sequential-backward-selection-sbs","text":"Input: the set of all features, Y = \\{y_1, y_2, ..., y_d\\} The SBS algorithm takes the whole feature set as input. Output: X_k = \\{x_j \\; | \\;j = 1, 2, ..., k; \\; x_j \\in Y\\} , where k = (0, 1, 2, ..., d) SBS returns a subset of features; the number of selected features k , where k < d , has to be specified a priori . Initialization: X_0 = Y , k = d We initialize the algorithm with the given feature set so that the k = d . Step 1 (Exclusion): x^- = \\text{ arg max } J(x_k - x), \\text{ where } x \\in X_k X_{k-1} = X_k - x^- k = k - 1 Go to Step 1 In this step, we remove a feature, x^- from our feature subset X_k . x^- is the feature that maximizes our criterion function upon re,oval, that is, the feature that is associated with the best classifier performance if it is removed from X_k . We repeat this procedure until the termination criterion is satisfied. Termination: k = p We add features from the feature subset X_k until the feature subset of size k contains the number of desired features p that we specified a priori .","title":"Sequential Backward Selection (SBS)"},{"location":"user_guide/feature_selection/SequentialFeatureSelector/#sequential-backward-floating-selection-sbfs","text":"Input: the set of all features, Y = \\{y_1, y_2, ..., y_d\\} The SBFS algorithm takes the whole feature set as input. Output: X_k = \\{x_j \\; | \\;j = 1, 2, ..., k; \\; x_j \\in Y\\} , where k = (0, 1, 2, ..., d) SBFS returns a subset of features; the number of selected features k , where k < d , has to be specified a priori . Initialization: X_0 = Y , k = d We initialize the algorithm with the given feature set so that the k = d . Step 1 (Exclusion): x^- = \\text{ arg max } J(x_k - x), \\text{ where } x \\in X_k X_{k-1} = X_k - x^- k = k - 1 Go to Step 2 In this step, we remove a feature, x^- from our feature subset X_k . x^- is the feature that maximizes our criterion function upon re,oval, that is, the feature that is associated with the best classifier performance if it is removed from X_k . Step 2 (Conditional Inclusion): x^+ = \\text{ arg max } J(x_k + x), \\text{ where } x \\in Y - X_k if J(x_k + x) > J(x_k + x) : X_{k+1} = X_k + x^+ k = k + 1 Go to Step 1 In Step 2, we search for features that improve the classifier performance if they are added back to the feature subset. If such features exist, we add the feature x^+ for which the performance improvement is maximized. If k = 2 or an improvement cannot be made (i.e., such feature x^+ cannot be found), go back to step 1; else, repeat this step. Termination: k = p We add features from the feature subset X_k until the feature subset of size k contains the number of desired features p that we specified a priori .","title":"Sequential Backward Floating Selection (SBFS)"},{"location":"user_guide/feature_selection/SequentialFeatureSelector/#sequential-forward-floating-selection-sffs","text":"Input: the set of all features, Y = \\{y_1, y_2, ..., y_d\\} The SFFS algorithm takes the whole feature set as input, if our feature space consists of, e.g. 10, if our feature space consists of 10 dimensions ( d = 10 ). Output: a subset of features, X_k = \\{x_j \\; | \\;j = 1, 2, ..., k; \\; x_j \\in Y\\} , where k = (0, 1, 2, ..., d) The returned output of the algorithm is a subset of the feature space of a specified size. E.g., a subset of 5 features from a 10-dimensional feature space ( k = 5, d = 10 ). Initialization: X_0 = Y , k = d We initialize the algorithm with an empty set (\"null set\") so that the k = 0 (where k is the size of the subset) Step 1 (Inclusion): x^+ = \\text{ arg max } J(x_k + x), \\text{ where } x \\in Y - X_k X_{k+1} = X_k + x^+ k = k + 1 Go to Step 2 Step 2 (Conditional Exclusion): x^- = \\text{ arg max } J(x_k - x), \\text{ where } x \\in X_k if \\; J(x_k - x) > J(x_k - x) : X_{k-1} = X_k - x^- k = k - 1 Go to Step 1 In step 1, we include the feature from the feature space that leads to the best performance increase for our feature subset (assessed by the criterion function ). Then, we go over to step 2 In step 2, we only remove a feature if the resulting subset would gain an increase in performance. If k = 2 or an improvement cannot be made (i.e., such feature x^+ cannot be found), go back to step 1; else, repeat this step. Steps 1 and 2 are repeated until the Termination criterion is reached. Termination: stop when k equals the number of desired features","title":"Sequential Forward Floating Selection (SFFS)"},{"location":"user_guide/feature_selection/SequentialFeatureSelector/#references","text":"Ferri, F. J., Pudil P., Hatef, M., Kittler, J. (1994). \"Comparative study of techniques for large-scale feature selection.\" Pattern Recognition in Practice IV : 403-413. Pudil, P., Novovi\u010dov\u00e1, J., & Kittler, J. (1994). \"Floating search methods in feature selection.\" Pattern recognition letters 15.11 (1994): 1119-1125.","title":"References"},{"location":"user_guide/feature_selection/SequentialFeatureSelector/#example-1-a-simple-sequential-forward-selection-example","text":"Initializing a simple classifier from scikit-learn: from sklearn.neighbors import KNeighborsClassifier from sklearn.datasets import load_iris iris = load_iris() X = iris.data y = iris.target knn = KNeighborsClassifier(n_neighbors=4) We start by selection the \"best\" 3 features from the Iris dataset via Sequential Forward Selection (SFS). Here, we set forward=True and floating=False . By choosing cv=0 , we don't perform any cross-validation, therefore, the performance (here: 'accuracy' ) is computed entirely on the training set. from mlxtend.feature_selection import SequentialFeatureSelector as SFS sfs1 = SFS(knn, k_features=3, forward=True, floating=False, verbose=2, scoring='accuracy', cv=0) sfs1 = sfs1.fit(X, y) [Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s [Parallel(n_jobs=1)]: Done 4 out of 4 | elapsed: 0.0s finished [2018-05-06 12:49:16] Features: 1/3 -- score: 0.96[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s [Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 0.0s finished [2018-05-06 12:49:16] Features: 2/3 -- score: 0.973333333333[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s [Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 0.0s finished [2018-05-06 12:49:16] Features: 3/3 -- score: 0.973333333333 Via the subsets_ attribute, we can take a look at the selected feature indices at each step: sfs1.subsets_ {1: {'avg_score': 0.95999999999999996, 'cv_scores': array([ 0.96]), 'feature_idx': (3,), 'feature_names': ('3',)}, 2: {'avg_score': 0.97333333333333338, 'cv_scores': array([ 0.97333333]), 'feature_idx': (2, 3), 'feature_names': ('2', '3')}, 3: {'avg_score': 0.97333333333333338, 'cv_scores': array([ 0.97333333]), 'feature_idx': (1, 2, 3), 'feature_names': ('1', '2', '3')}} Note that the 'feature_names' entry is simply a string representation of the 'feature_idx' in this case. Optionally, we can provide custom feature names via the fit method's custom_feature_names parameter: feature_names = ('sepal length', 'sepal width', 'petal length', 'petal width') sfs1 = sfs1.fit(X, y, custom_feature_names=feature_names) sfs1.subsets_ [Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s [Parallel(n_jobs=1)]: Done 4 out of 4 | elapsed: 0.0s finished [2018-05-06 12:49:16] Features: 1/3 -- score: 0.96[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s [Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 0.0s finished [2018-05-06 12:49:16] Features: 2/3 -- score: 0.973333333333[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s [Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 0.0s finished [2018-05-06 12:49:16] Features: 3/3 -- score: 0.973333333333 {1: {'avg_score': 0.95999999999999996, 'cv_scores': array([ 0.96]), 'feature_idx': (3,), 'feature_names': ('petal width',)}, 2: {'avg_score': 0.97333333333333338, 'cv_scores': array([ 0.97333333]), 'feature_idx': (2, 3), 'feature_names': ('petal length', 'petal width')}, 3: {'avg_score': 0.97333333333333338, 'cv_scores': array([ 0.97333333]), 'feature_idx': (1, 2, 3), 'feature_names': ('sepal width', 'petal length', 'petal width')}} Furthermore, we can access the indices of the 3 best features directly via the k_feature_idx_ attribute: sfs1.k_feature_idx_ (1, 2, 3) And similarly, to obtain the names of these features, given that we provided an argument to the custom_feature_names parameter, we can refer to the sfs1.k_feature_names_ attribute: sfs1.k_feature_names_ ('sepal width', 'petal length', 'petal width') Finally, the prediction score for these 3 features can be accesses via k_score_ : sfs1.k_score_ 0.97333333333333338","title":"Example 1 - A simple Sequential Forward Selection example"},{"location":"user_guide/feature_selection/SequentialFeatureSelector/#example-2-toggling-between-sfs-sbs-sffs-and-sbfs","text":"Using the forward and floating parameters, we can toggle between SFS, SBS, SFFS, and SBFS as shown below. Note that we are performing (stratified) 4-fold cross-validation for more robust estimates in contrast to Example 1. Via n_jobs=-1 , we choose to run the cross-validation on all our available CPU cores. # Sequential Forward Selection sfs = SFS(knn, k_features=3, forward=True, floating=False, scoring='accuracy', cv=4, n_jobs=-1) sfs = sfs.fit(X, y) print('\\nSequential Forward Selection (k=3):') print(sfs.k_feature_idx_) print('CV Score:') print(sfs.k_score_) ################################################### # Sequential Backward Selection sbs = SFS(knn, k_features=3, forward=False, floating=False, scoring='accuracy', cv=4, n_jobs=-1) sbs = sbs.fit(X, y) print('\\nSequential Backward Selection (k=3):') print(sbs.k_feature_idx_) print('CV Score:') print(sbs.k_score_) ################################################### # Sequential Forward Floating Selection sffs = SFS(knn, k_features=3, forward=True, floating=True, scoring='accuracy', cv=4, n_jobs=-1) sffs = sffs.fit(X, y) print('\\nSequential Forward Floating Selection (k=3):') print(sffs.k_feature_idx_) print('CV Score:') print(sffs.k_score_) ################################################### # Sequential Backward Floating Selection sbfs = SFS(knn, k_features=3, forward=False, floating=True, scoring='accuracy', cv=4, n_jobs=-1) sbfs = sbfs.fit(X, y) print('\\nSequential Backward Floating Selection (k=3):') print(sbfs.k_feature_idx_) print('CV Score:') print(sbfs.k_score_) Sequential Forward Selection (k=3): (1, 2, 3) CV Score: 0.972756410256 Sequential Backward Selection (k=3): (1, 2, 3) CV Score: 0.972756410256 Sequential Forward Floating Selection (k=3): (1, 2, 3) CV Score: 0.972756410256 Sequential Backward Floating Selection (k=3): (1, 2, 3) CV Score: 0.972756410256 In this simple scenario, selecting the best 3 features out of the 4 available features in the Iris set, we end up with similar results regardless of which sequential selection algorithms we used.","title":"Example 2 - Toggling between SFS, SBS, SFFS, and SBFS"},{"location":"user_guide/feature_selection/SequentialFeatureSelector/#example-3-visualizing-the-results-in-dataframes","text":"For our convenience, we can visualize the output from the feature selection in a pandas DataFrame format using the get_metric_dict method of the SequentialFeatureSelector object. The columns std_dev and std_err represent the standard deviation and standard errors of the cross-validation scores, respectively. Below, we see the DataFrame of the Sequential Forward Selector from Example 2: import pandas as pd pd.DataFrame.from_dict(sfs.get_metric_dict()).T .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } avg_score ci_bound cv_scores feature_idx feature_names std_dev std_err 1 0.952991 0.0660624 [0.974358974359, 0.948717948718, 0.88888888888... (3,) (3,) 0.0412122 0.0237939 2 0.959936 0.0494801 [0.974358974359, 0.948717948718, 0.91666666666... (2, 3) (2, 3) 0.0308676 0.0178214 3 0.972756 0.0315204 [0.974358974359, 1.0, 0.944444444444, 0.972222... (1, 2, 3) (1, 2, 3) 0.0196636 0.0113528 Now, let's compare it to the Sequential Backward Selector: pd.DataFrame.from_dict(sbs.get_metric_dict()).T .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } avg_score ci_bound cv_scores feature_idx feature_names std_dev std_err 3 0.972756 0.0315204 [0.974358974359, 1.0, 0.944444444444, 0.972222... (1, 2, 3) (1, 2, 3) 0.0196636 0.0113528 4 0.952991 0.0372857 [0.974358974359, 0.948717948718, 0.91666666666... (0, 1, 2, 3) (0, 1, 2, 3) 0.0232602 0.0134293 We can see that both SFS and SBFS found the same \"best\" 3 features, however, the intermediate steps where obviously different. The ci_bound column in the DataFrames above represents the confidence interval around the computed cross-validation scores. By default, a confidence interval of 95% is used, but we can use different confidence bounds via the confidence_interval parameter. E.g., the confidence bounds for a 90% confidence interval can be obtained as follows: pd.DataFrame.from_dict(sbs.get_metric_dict(confidence_interval=0.90)).T .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } avg_score ci_bound cv_scores feature_idx feature_names std_dev std_err 3 0.972756 0.0242024 [0.974358974359, 1.0, 0.944444444444, 0.972222... (1, 2, 3) (1, 2, 3) 0.0196636 0.0113528 4 0.952991 0.0286292 [0.974358974359, 0.948717948718, 0.91666666666... (0, 1, 2, 3) (0, 1, 2, 3) 0.0232602 0.0134293","title":"Example 3 - Visualizing the results in DataFrames"},{"location":"user_guide/feature_selection/SequentialFeatureSelector/#example-4-plotting-the-results","text":"After importing the little helper function plotting.plot_sequential_feature_selection , we can also visualize the results using matplotlib figures. from mlxtend.plotting import plot_sequential_feature_selection as plot_sfs import matplotlib.pyplot as plt sfs = SFS(knn, k_features=4, forward=True, floating=False, scoring='accuracy', verbose=2, cv=5) sfs = sfs.fit(X, y) fig1 = plot_sfs(sfs.get_metric_dict(), kind='std_dev') plt.ylim([0.8, 1]) plt.title('Sequential Forward Selection (w. StdDev)') plt.grid() plt.show() [Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s [Parallel(n_jobs=1)]: Done 4 out of 4 | elapsed: 0.0s finished [2018-05-06 12:49:18] Features: 1/4 -- score: 0.96[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s [Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 0.0s finished [2018-05-06 12:49:18] Features: 2/4 -- score: 0.966666666667[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s [Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 0.0s finished [2018-05-06 12:49:18] Features: 3/4 -- score: 0.953333333333[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s [Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s finished [2018-05-06 12:49:18] Features: 4/4 -- score: 0.973333333333","title":"Example 4 - Plotting the results"},{"location":"user_guide/feature_selection/SequentialFeatureSelector/#example-5-sequential-feature-selection-for-regression","text":"Similar to the classification examples above, the SequentialFeatureSelector also supports scikit-learn's estimators for regression. from sklearn.linear_model import LinearRegression from sklearn.datasets import load_boston boston = load_boston() X, y = boston.data, boston.target lr = LinearRegression() sfs = SFS(lr, k_features=13, forward=True, floating=False, scoring='neg_mean_squared_error', cv=10) sfs = sfs.fit(X, y) fig = plot_sfs(sfs.get_metric_dict(), kind='std_err') plt.title('Sequential Forward Selection (w. StdErr)') plt.grid() plt.show()","title":"Example 5 - Sequential Feature Selection for Regression"},{"location":"user_guide/feature_selection/SequentialFeatureSelector/#example-6-feature-selection-with-fixed-trainvalidation-splits","text":"If you do not wish to use cross-validation (here: k-fold cross-validation, i.e., rotating training and validation folds), you can use the PredefinedHoldoutSplit class to specify your own, fixed training and validation split. from sklearn.datasets import load_iris from mlxtend.evaluate import PredefinedHoldoutSplit import numpy as np iris = load_iris() X = iris.data y = iris.target rng = np.random.RandomState(123) my_validation_indices = rng.permutation(np.arange(150))[:30] print(my_validation_indices) [ 72 112 132 88 37 138 87 42 8 90 141 33 59 116 135 104 36 13 63 45 28 133 24 127 46 20 31 121 117 4] from sklearn.neighbors import KNeighborsClassifier from mlxtend.feature_selection import SequentialFeatureSelector as SFS knn = KNeighborsClassifier(n_neighbors=4) piter = PredefinedHoldoutSplit(my_validation_indices) sfs1 = SFS(knn, k_features=3, forward=True, floating=False, verbose=2, scoring='accuracy', cv=piter) sfs1 = sfs1.fit(X, y) [Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s [Parallel(n_jobs=1)]: Done 4 out of 4 | elapsed: 0.0s finished [2018-09-24 02:31:21] Features: 1/3 -- score: 0.9666666666666667[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s [Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 0.0s finished [2018-09-24 02:31:21] Features: 2/3 -- score: 0.9666666666666667[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s [Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 0.0s finished [2018-09-24 02:31:21] Features: 3/3 -- score: 0.9666666666666667","title":"Example 6 -- Feature Selection with Fixed Train/Validation Splits"},{"location":"user_guide/feature_selection/SequentialFeatureSelector/#example-7-using-the-selected-feature-subset-for-making-new-predictions","text":"# Initialize the dataset from sklearn.neighbors import KNeighborsClassifier from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split iris = load_iris() X, y = iris.data, iris.target X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.33, random_state=1) knn = KNeighborsClassifier(n_neighbors=4) # Select the \"best\" three features via # 5-fold cross-validation on the training set. from mlxtend.feature_selection import SequentialFeatureSelector as SFS sfs1 = SFS(knn, k_features=3, forward=True, floating=False, scoring='accuracy', cv=5) sfs1 = sfs1.fit(X_train, y_train) print('Selected features:', sfs1.k_feature_idx_) Selected features: (1, 2, 3) # Generate the new subsets based on the selected features # Note that the transform call is equivalent to # X_train[:, sfs1.k_feature_idx_] X_train_sfs = sfs1.transform(X_train) X_test_sfs = sfs1.transform(X_test) # Fit the estimator using the new feature subset # and make a prediction on the test data knn.fit(X_train_sfs, y_train) y_pred = knn.predict(X_test_sfs) # Compute the accuracy of the prediction acc = float((y_test == y_pred).sum()) / y_pred.shape[0] print('Test set accuracy: %.2f %%' % (acc * 100)) Test set accuracy: 96.00 %","title":"Example 7 -- Using the Selected Feature Subset For Making New Predictions"},{"location":"user_guide/feature_selection/SequentialFeatureSelector/#example-8-sequential-feature-selection-and-gridsearch","text":"# Initialize the dataset from sklearn.neighbors import KNeighborsClassifier from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split iris = load_iris() X, y = iris.data, iris.target X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.33, random_state=1) Use scikit-learn's GridSearch to tune the hyperparameters inside and outside the SequentialFeatureSelector : from sklearn.model_selection import GridSearchCV from sklearn.pipeline import Pipeline from mlxtend.feature_selection import SequentialFeatureSelector as SFS import mlxtend knn = KNeighborsClassifier(n_neighbors=2) sfs1 = SFS(estimator=knn, k_features=3, forward=True, floating=False, scoring='accuracy', cv=5) pipe = Pipeline([('sfs', sfs1), ('knn', knn)]) param_grid = [ {'sfs__k_features': [1, 2, 3, 4], 'sfs__estimator__n_neighbors': [1, 2, 3, 4]} ] gs = GridSearchCV(estimator=pipe, param_grid=param_grid, scoring='accuracy', n_jobs=1, cv=5, refit=False) # run gridearch gs = gs.fit(X_train, y_train) ... and the \"best\" parameters determined by GridSearch are ... print(\"Best parameters via GridSearch\", gs.best_params_) Best parameters via GridSearch {'sfs__estimator__n_neighbors': 1, 'sfs__k_features': 3}","title":"Example 8 -- Sequential Feature Selection and GridSearch"},{"location":"user_guide/feature_selection/SequentialFeatureSelector/#obtaining-the-best-k-feature-indices-after-gridsearch","text":"If we are interested in the best k feature indices via SequentialFeatureSelection.k_feature_idx_ , we have to initialize a GridSearchCV object with refit=True . Now, the grid search object will take the complete training dataset and the best parameters, which it found via cross-validation, to train the estimator pipeline. gs = GridSearchCV(estimator=pipe, param_grid=param_grid, scoring='accuracy', n_jobs=1, cv=5, refit=True) gs = gs.fit(X_train, y_train) After running the grid search, we can access the individual pipeline objects of the best_estimator_ via the steps attribute. gs.best_estimator_.steps [('sfs', SequentialFeatureSelector(clone_estimator=True, cv=5, estimator=KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski', metric_params=None, n_jobs=1, n_neighbors=1, p=2, weights='uniform'), floating=False, forward=True, k_features=3, n_jobs=1, pre_dispatch='2*n_jobs', scoring='accuracy', verbose=0)), ('knn', KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski', metric_params=None, n_jobs=1, n_neighbors=2, p=2, weights='uniform'))] Via sub-indexing, we can then obtain the best-selected feature subset: print('Best features:', gs.best_estimator_.steps[0][1].k_feature_idx_) Best features: (0, 1, 3) During cross-validation, this feature combination had a CV accuracy of: print('Best score:', gs.best_score_) Best score: 0.94 gs.best_params_ {'sfs__estimator__n_neighbors': 1, 'sfs__k_features': 3} Alternatively , if we can set the \"best grid search parameters\" in our pipeline manually if we ran GridSearchCV with refit=False . It should yield the same results: pipe.set_params(**gs.best_params_).fit(X_train, y_train) print('Best features:', pipe.steps[0][1].k_feature_idx_) Best features: (0, 1, 3)","title":"Obtaining the best k feature indices after GridSearch"},{"location":"user_guide/feature_selection/SequentialFeatureSelector/#example-9-selecting-the-best-feature-combination-in-a-k-range","text":"If k_features is set to to a tuple (min_k, max_k) (new in 0.4.2), the SFS will now select the best feature combination that it discovered by iterating from k=1 to max_k (forward), or max_k to min_k (backward). The size of the returned feature subset is then within max_k to min_k , depending on which combination scored best during cross validation. X.shape (150, 4) from mlxtend.feature_selection import SequentialFeatureSelector as SFS from sklearn.neighbors import KNeighborsClassifier from mlxtend.data import wine_data from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.pipeline import make_pipeline X, y = wine_data() X_train, X_test, y_train, y_test= train_test_split(X, y, stratify=y, test_size=0.3, random_state=1) knn = KNeighborsClassifier(n_neighbors=2) sfs1 = SFS(estimator=knn, k_features=(3, 10), forward=True, floating=False, scoring='accuracy', cv=5) pipe = make_pipeline(StandardScaler(), sfs1) pipe.fit(X_train, y_train) print('best combination (ACC: %.3f): %s\\n' % (sfs1.k_score_, sfs1.k_feature_idx_)) print('all subsets:\\n', sfs1.subsets_) plot_sfs(sfs1.get_metric_dict(), kind='std_err'); best combination (ACC: 0.992): (0, 1, 2, 3, 6, 8, 9, 10, 11, 12) all subsets: {1: {'feature_idx': (6,), 'cv_scores': array([ 0.84615385, 0.6 , 0.88 , 0.79166667, 0.875 ]), 'avg_score': 0.7985641025641026, 'feature_names': ('6',)}, 2: {'feature_idx': (6, 9), 'cv_scores': array([ 0.92307692, 0.88 , 1. , 0.95833333, 0.91666667]), 'avg_score': 0.93561538461538463, 'feature_names': ('6', '9')}, 3: {'feature_idx': (6, 9, 12), 'cv_scores': array([ 0.92307692, 0.92 , 0.96 , 1. , 0.95833333]), 'avg_score': 0.95228205128205123, 'feature_names': ('6', '9', '12')}, 4: {'feature_idx': (3, 6, 9, 12), 'cv_scores': array([ 0.96153846, 0.96 , 0.96 , 1. , 0.95833333]), 'avg_score': 0.96797435897435891, 'feature_names': ('3', '6', '9', '12')}, 5: {'feature_idx': (3, 6, 9, 10, 12), 'cv_scores': array([ 0.92307692, 0.96 , 1. , 1. , 1. ]), 'avg_score': 0.97661538461538466, 'feature_names': ('3', '6', '9', '10', '12')}, 6: {'feature_idx': (2, 3, 6, 9, 10, 12), 'cv_scores': array([ 0.92307692, 0.96 , 1. , 0.95833333, 1. ]), 'avg_score': 0.96828205128205125, 'feature_names': ('2', '3', '6', '9', '10', '12')}, 7: {'feature_idx': (0, 2, 3, 6, 9, 10, 12), 'cv_scores': array([ 0.92307692, 0.92 , 1. , 1. , 1. ]), 'avg_score': 0.96861538461538466, 'feature_names': ('0', '2', '3', '6', '9', '10', '12')}, 8: {'feature_idx': (0, 2, 3, 6, 8, 9, 10, 12), 'cv_scores': array([ 1. , 0.92, 1. , 1. , 1. ]), 'avg_score': 0.98399999999999999, 'feature_names': ('0', '2', '3', '6', '8', '9', '10', '12')}, 9: {'feature_idx': (0, 2, 3, 6, 8, 9, 10, 11, 12), 'cv_scores': array([ 1. , 0.92, 1. , 1. , 1. ]), 'avg_score': 0.98399999999999999, 'feature_names': ('0', '2', '3', '6', '8', '9', '10', '11', '12')}, 10: {'feature_idx': (0, 1, 2, 3, 6, 8, 9, 10, 11, 12), 'cv_scores': array([ 1. , 0.96, 1. , 1. , 1. ]), 'avg_score': 0.99199999999999999, 'feature_names': ('0', '1', '2', '3', '6', '8', '9', '10', '11', '12')}}","title":"Example 9 -- Selecting the \"best\" feature combination in a k-range"},{"location":"user_guide/feature_selection/SequentialFeatureSelector/#example-10-using-other-cross-validation-schemes","text":"In addition to standard k-fold and stratified k-fold, other cross validation schemes can be used with SequentialFeatureSelector . For example, GroupKFold or LeaveOneOut cross-validation from scikit-learn.","title":"Example 10 -- Using other cross-validation schemes"},{"location":"user_guide/feature_selection/SequentialFeatureSelector/#using-groupkfold-with-sequentialfeatureselector","text":"from mlxtend.feature_selection import SequentialFeatureSelector as SFS from sklearn.neighbors import KNeighborsClassifier from mlxtend.data import iris_data from sklearn.model_selection import GroupKFold import numpy as np X, y = iris_data() groups = np.arange(len(y)) // 10 print('groups: {}'.format(groups)) groups: [ 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9 9 9 10 10 10 10 10 10 10 10 10 10 11 11 11 11 11 11 11 11 11 11 12 12 12 12 12 12 12 12 12 12 13 13 13 13 13 13 13 13 13 13 14 14 14 14 14 14 14 14 14 14] Calling the split() method of a scikit-learn cross-validator object will return a generator that yields train, test splits. cv_gen = GroupKFold(4).split(X, y, groups) cv_gen The cv parameter of SequentialFeatureSelector must be either an int or an iterable yielding train, test splits. This iterable can be constructed by passing the train, test split generator to the built-in list() function. cv = list(cv_gen) knn = KNeighborsClassifier(n_neighbors=2) sfs = SFS(estimator=knn, k_features=2, scoring='accuracy', cv=cv) sfs.fit(X, y) print('best combination (ACC: %.3f): %s\\n' % (sfs.k_score_, sfs.k_feature_idx_)) best combination (ACC: 0.940): (2, 3)","title":"Using GroupKFold with SequentialFeatureSelector"},{"location":"user_guide/feature_selection/SequentialFeatureSelector/#example-11-working-with-pandas-dataframes","text":"","title":"Example 11 - Working with pandas DataFrames"},{"location":"user_guide/feature_selection/SequentialFeatureSelector/#example-12-using-pandas-dataframes","text":"Optionally, we can also use pandas DataFrames and pandas Series as input to the fit function. In this case, the column names of the pandas DataFrame will be used as feature names. However, note that if custom_feature_names are provided in the fit function, these custom_feature_names take precedence over the DataFrame column-based feature names. import pandas as pd from sklearn.neighbors import KNeighborsClassifier from sklearn.datasets import load_iris from mlxtend.feature_selection import SequentialFeatureSelector as SFS iris = load_iris() X = iris.data y = iris.target knn = KNeighborsClassifier(n_neighbors=4) sfs1 = SFS(knn, k_features=3, forward=True, floating=False, scoring='accuracy', cv=0) X_df = pd.DataFrame(X, columns=['sepal len', 'petal len', 'sepal width', 'petal width']) X_df.head() .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } sepal len petal len sepal width petal width 0 5.1 3.5 1.4 0.2 1 4.9 3.0 1.4 0.2 2 4.7 3.2 1.3 0.2 3 4.6 3.1 1.5 0.2 4 5.0 3.6 1.4 0.2 Also, the target array, y , can be optionally be cast as a Series: y_series = pd.Series(y) y_series.head() 0 0 1 0 2 0 3 0 4 0 dtype: int64 sfs1 = sfs1.fit(X_df, y_series) Note that the only difference of passing a pandas DataFrame as input is that the sfs1.subsets_ array will now contain a new column, sfs1.subsets_ {1: {'avg_score': 0.95999999999999996, 'cv_scores': array([ 0.96]), 'feature_idx': (3,), 'feature_names': ('petal width',)}, 2: {'avg_score': 0.97333333333333338, 'cv_scores': array([ 0.97333333]), 'feature_idx': (2, 3), 'feature_names': ('sepal width', 'petal width')}, 3: {'avg_score': 0.97333333333333338, 'cv_scores': array([ 0.97333333]), 'feature_idx': (1, 2, 3), 'feature_names': ('petal len', 'sepal width', 'petal width')}} In mlxtend version >= 0.13 pandas DataFrames are supported as feature inputs to the SequentianFeatureSelector instead of NumPy arrays or other NumPy-like array types.","title":"Example 12 - Using Pandas DataFrames"},{"location":"user_guide/feature_selection/SequentialFeatureSelector/#api","text":"SequentialFeatureSelector(estimator, k_features=1, forward=True, floating=False, verbose=0, scoring=None, cv=5, n_jobs=1, pre_dispatch='2 n_jobs', clone_estimator=True)* Sequential Feature Selection for Classification and Regression. Parameters estimator : scikit-learn classifier or regressor k_features : int or tuple or str (default: 1) Number of features to select, where k_features < the full feature set. New in 0.4.2: A tuple containing a min and max value can be provided, and the SFS will consider return any feature combination between min and max that scored highest in cross-validtion. For example, the tuple (1, 4) will return any combination from 1 up to 4 features instead of a fixed number of features k. New in 0.8.0: A string argument \"best\" or \"parsimonious\". If \"best\" is provided, the feature selector will return the feature subset with the best cross-validation performance. If \"parsimonious\" is provided as an argument, the smallest feature subset that is within one standard error of the cross-validation performance will be selected. forward : bool (default: True) Forward selection if True, backward selection otherwise floating : bool (default: False) Adds a conditional exclusion/inclusion if True. verbose : int (default: 0), level of verbosity to use in logging. If 0, no output, if 1 number of features in current set, if 2 detailed logging i ncluding timestamp and cv scores at step. scoring : str, callable, or None (default: None) If None (default), uses 'accuracy' for sklearn classifiers and 'r2' for sklearn regressors. If str, uses a sklearn scoring metric string identifier, for example {accuracy, f1, precision, recall, roc_auc} for classifiers, {'mean_absolute_error', 'mean_squared_error'/'neg_mean_squared_error', 'median_absolute_error', 'r2'} for regressors. If a callable object or function is provided, it has to be conform with sklearn's signature scorer(estimator, X, y) ; see http://scikit-learn.org/stable/modules/generated/sklearn.metrics.make_scorer.html for more information. cv : int (default: 5) Integer or iterable yielding train, test splits. If cv is an integer and estimator is a classifier (or y consists of integer class labels) stratified k-fold. Otherwise regular k-fold cross-validation is performed. No cross-validation if cv is None, False, or 0. n_jobs : int (default: 1) The number of CPUs to use for evaluating different feature subsets in parallel. -1 means 'all CPUs'. pre_dispatch : int, or string (default: '2*n_jobs') Controls the number of jobs that get dispatched during parallel execution if n_jobs > 1 or n_jobs=-1 . Reducing this number can be useful to avoid an explosion of memory consumption when more jobs get dispatched than CPUs can process. This parameter can be: None, in which case all the jobs are immediately created and spawned. Use this for lightweight and fast-running jobs, to avoid delays due to on-demand spawning of the jobs An int, giving the exact number of total jobs that are spawned A string, giving an expression as a function of n_jobs, as in 2*n_jobs clone_estimator : bool (default: True) Clones estimator if True; works with the original estimator instance if False. Set to False if the estimator doesn't implement scikit-learn's set_params and get_params methods. In addition, it is required to set cv=0, and n_jobs=1. Attributes k_feature_idx_ : array-like, shape = [n_predictions] Feature Indices of the selected feature subsets. k_feature_names_ : array-like, shape = [n_predictions] Feature names of the selected feature subsets. If pandas DataFrames are used in the fit method, the feature names correspond to the column names. Otherwise, the feature names are string representation of the feature array indices. New in v 0.13.0. k_score_ : float Cross validation average score of the selected subset. subsets_ : dict A dictionary of selected feature subsets during the sequential selection, where the dictionary keys are the lengths k of these feature subsets. The dictionary values are dictionaries themselves with the following keys: 'feature_idx' (tuple of indices of the feature subset) 'feature_names' (tuple of feature names of the feat. subset) 'cv_scores' (list individual cross-validation scores) 'avg_score' (average cross-validation score) Note that if pandas DataFrames are used in the fit method, the 'feature_names' correspond to the column names. Otherwise, the feature names are string representation of the feature array indices. The 'feature_names' is new in v 0.13.0. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/feature_selection/SequentialFeatureSelector/","title":"API"},{"location":"user_guide/feature_selection/SequentialFeatureSelector/#methods","text":"fit(X, y, custom_feature_names=None, fit_params) Perform feature selection and learn model from training data. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. New in v 0.13.0: pandas DataFrames are now also accepted as argument for X. y : array-like, shape = [n_samples] Target values. New in v 0.13.0: pandas DataFrames are now also accepted as argument for y. custom_feature_names : None or tuple (default: tuple) Custom feature names for self.k_feature_names and self.subsets_[i]['feature_names'] . (new in v 0.13.0) fit_params : dict of string -> object, optional Parameters to pass to to the fit method of classifier. Returns self : object fit_transform(X, y, fit_params) Fit to training data then reduce X to its most important features. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. New in v 0.13.0: pandas DataFrames are now also accepted as argument for X. y : array-like, shape = [n_samples] Target values. New in v 0.13.0: a pandas Series are now also accepted as argument for y. fit_params : dict of string -> object, optional Parameters to pass to to the fit method of classifier. Returns Reduced feature subset of X, shape={n_samples, k_features} get_metric_dict(confidence_interval=0.95) Return metric dictionary Parameters confidence_interval : float (default: 0.95) A positive float between 0.0 and 1.0 to compute the confidence interval bounds of the CV score averages. Returns Dictionary with items where each dictionary value is a list with the number of iterations (number of feature subsets) as its length. The dictionary keys corresponding to these lists are as follows: 'feature_idx': tuple of the indices of the feature subset 'cv_scores': list with individual CV scores 'avg_score': of CV average scores 'std_dev': standard deviation of the CV score average 'std_err': standard error of the CV score average 'ci_bound': confidence interval bound of the CV score average get_params(deep=True) Get parameters for this estimator. Parameters deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns params : mapping of string to any Parameter names mapped to their values. set_params( params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it's possible to update each component of a nested object. Returns self transform(X) Reduce X to its most important features. Parameters X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. New in v 0.13.0: pandas DataFrames are now also accepted as argument for X. Returns Reduced feature subset of X, shape={n_samples, k_features}","title":"Methods"},{"location":"user_guide/file_io/find_filegroups/","text":"Find Filegroups A function that finds files that belong together (i.e., differ only by file extension) in different directories and collects them in a Python dictionary for further processing tasks. from mlxtend.file_io import find_filegroups Overview This function finds files that are related to each other based on their file names. This can be useful for parsing collections files that have been stored in different subdirectories, for examples: input_dir/ task01.txt task02.txt ... log_dir/ task01.log task02.log ... output_dir/ task01.dat task02.dat ... References - Example 1 - Grouping related files in a dictionary Given the following directory and file structure dir_1/ file_1.log file_2.log file_3.log dir_2/ file_1.csv file_2.csv file_3.csv dir_3/ file_1.txt file_2.txt file_3.txt we can use find_filegroups to group related files as items of a dictionary as shown below: from mlxtend.file_io import find_filegroups find_filegroups(paths=['./data_find_filegroups/dir_1', './data_find_filegroups/dir_2', './data_find_filegroups/dir_3'], substring='file_') {'file_1': ['./data_find_filegroups/dir_1/file_1.log', './data_find_filegroups/dir_2/file_1.csv', './data_find_filegroups/dir_3/file_1.txt'], 'file_2': ['./data_find_filegroups/dir_1/file_2.log', './data_find_filegroups/dir_2/file_2.csv', './data_find_filegroups/dir_3/file_2.txt'], 'file_3': ['./data_find_filegroups/dir_1/file_3.log', './data_find_filegroups/dir_2/file_3.csv', './data_find_filegroups/dir_3/file_3.txt']} API find_filegroups(paths, substring='', extensions=None, validity_check=True, ignore_invisible=True, rstrip='', ignore_substring=None) Find and collect files from different directories in a python dictionary. Parameters paths : list Paths of the directories to be searched. Dictionary keys are build from the first directory. substring : str (default: '') Substring that all files have to contain to be considered. extensions : list (default: None) None or list of allowed file extensions for each path. If provided, the number of extensions must match the number of paths . validity_check : bool (default: None) If True , checks if all dictionary values have the same number of file paths. Prints a warning and returns an empty dictionary if the validity check failed. ignore_invisible : bool (default: True) If True , ignores invisible files (i.e., files starting with a period). rstrip : str (default: '') If provided, strips characters from right side of the file base names after splitting the extension. Useful to trim different filenames to a common stem. E.g,. \"abc_d.txt\" and \"abc_d_.csv\" would share the stem \"abc_d\" if rstrip is set to \"_\". ignore_substring : str (default: None) Ignores files that contain the specified substring. Returns groups : dict Dictionary of files paths. Keys are the file names found in the first directory listed in paths (without file extension). Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/file_io/find_filegroups/","title":"Find Filegroups"},{"location":"user_guide/file_io/find_filegroups/#find-filegroups","text":"A function that finds files that belong together (i.e., differ only by file extension) in different directories and collects them in a Python dictionary for further processing tasks. from mlxtend.file_io import find_filegroups","title":"Find Filegroups"},{"location":"user_guide/file_io/find_filegroups/#overview","text":"This function finds files that are related to each other based on their file names. This can be useful for parsing collections files that have been stored in different subdirectories, for examples: input_dir/ task01.txt task02.txt ... log_dir/ task01.log task02.log ... output_dir/ task01.dat task02.dat ...","title":"Overview"},{"location":"user_guide/file_io/find_filegroups/#references","text":"-","title":"References"},{"location":"user_guide/file_io/find_filegroups/#example-1-grouping-related-files-in-a-dictionary","text":"Given the following directory and file structure dir_1/ file_1.log file_2.log file_3.log dir_2/ file_1.csv file_2.csv file_3.csv dir_3/ file_1.txt file_2.txt file_3.txt we can use find_filegroups to group related files as items of a dictionary as shown below: from mlxtend.file_io import find_filegroups find_filegroups(paths=['./data_find_filegroups/dir_1', './data_find_filegroups/dir_2', './data_find_filegroups/dir_3'], substring='file_') {'file_1': ['./data_find_filegroups/dir_1/file_1.log', './data_find_filegroups/dir_2/file_1.csv', './data_find_filegroups/dir_3/file_1.txt'], 'file_2': ['./data_find_filegroups/dir_1/file_2.log', './data_find_filegroups/dir_2/file_2.csv', './data_find_filegroups/dir_3/file_2.txt'], 'file_3': ['./data_find_filegroups/dir_1/file_3.log', './data_find_filegroups/dir_2/file_3.csv', './data_find_filegroups/dir_3/file_3.txt']}","title":"Example 1 - Grouping related files in a dictionary"},{"location":"user_guide/file_io/find_filegroups/#api","text":"find_filegroups(paths, substring='', extensions=None, validity_check=True, ignore_invisible=True, rstrip='', ignore_substring=None) Find and collect files from different directories in a python dictionary. Parameters paths : list Paths of the directories to be searched. Dictionary keys are build from the first directory. substring : str (default: '') Substring that all files have to contain to be considered. extensions : list (default: None) None or list of allowed file extensions for each path. If provided, the number of extensions must match the number of paths . validity_check : bool (default: None) If True , checks if all dictionary values have the same number of file paths. Prints a warning and returns an empty dictionary if the validity check failed. ignore_invisible : bool (default: True) If True , ignores invisible files (i.e., files starting with a period). rstrip : str (default: '') If provided, strips characters from right side of the file base names after splitting the extension. Useful to trim different filenames to a common stem. E.g,. \"abc_d.txt\" and \"abc_d_.csv\" would share the stem \"abc_d\" if rstrip is set to \"_\". ignore_substring : str (default: None) Ignores files that contain the specified substring. Returns groups : dict Dictionary of files paths. Keys are the file names found in the first directory listed in paths (without file extension). Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/file_io/find_filegroups/","title":"API"},{"location":"user_guide/file_io/find_files/","text":"Find Files A function that finds files in a given directory based on substring matches and returns a list of the file names found. from mlxtend.file_io import find_files Overview This function finds files based on substring search. This is especially useful if we want to find specific files in a directory tree and return their absolute paths for further processing in Python. References - Example 1 - Grouping related files in a dictionary Given the following directory and file structure dir_1/ file_1.log file_2.log file_3.log dir_2/ file_1.csv file_2.csv file_3.csv dir_3/ file_1.txt file_2.txt file_3.txt we can use find_files to return the paths to all files that contain the substring _2 as follows: from mlxtend.file_io import find_files find_files(substring='_2', path='./data_find_filegroups/', recursive=True) ['./data_find_filegroups/dir_1/file_2.log', './data_find_filegroups/dir_2/file_2.csv', './data_find_filegroups/dir_3/file_2.txt'] API find_files(substring, path, recursive=False, check_ext=None, ignore_invisible=True, ignore_substring=None) Find files in a directory based on substring matching. Parameters substring : str Substring of the file to be matched. path : str Path where to look. recursive : bool If true, searches subdirectories recursively. check_ext : str If string (e.g., '.txt'), only returns files that match the specified file extension. ignore_invisible : bool If True , ignores invisible files (i.e., files starting with a period). ignore_substring : str Ignores files that contain the specified substring. Returns results : list List of the matched files. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/file_io/find_files/","title":"Find Files"},{"location":"user_guide/file_io/find_files/#find-files","text":"A function that finds files in a given directory based on substring matches and returns a list of the file names found. from mlxtend.file_io import find_files","title":"Find Files"},{"location":"user_guide/file_io/find_files/#overview","text":"This function finds files based on substring search. This is especially useful if we want to find specific files in a directory tree and return their absolute paths for further processing in Python.","title":"Overview"},{"location":"user_guide/file_io/find_files/#references","text":"-","title":"References"},{"location":"user_guide/file_io/find_files/#example-1-grouping-related-files-in-a-dictionary","text":"Given the following directory and file structure dir_1/ file_1.log file_2.log file_3.log dir_2/ file_1.csv file_2.csv file_3.csv dir_3/ file_1.txt file_2.txt file_3.txt we can use find_files to return the paths to all files that contain the substring _2 as follows: from mlxtend.file_io import find_files find_files(substring='_2', path='./data_find_filegroups/', recursive=True) ['./data_find_filegroups/dir_1/file_2.log', './data_find_filegroups/dir_2/file_2.csv', './data_find_filegroups/dir_3/file_2.txt']","title":"Example 1 - Grouping related files in a dictionary"},{"location":"user_guide/file_io/find_files/#api","text":"find_files(substring, path, recursive=False, check_ext=None, ignore_invisible=True, ignore_substring=None) Find files in a directory based on substring matching. Parameters substring : str Substring of the file to be matched. path : str Path where to look. recursive : bool If true, searches subdirectories recursively. check_ext : str If string (e.g., '.txt'), only returns files that match the specified file extension. ignore_invisible : bool If True , ignores invisible files (i.e., files starting with a period). ignore_substring : str Ignores files that contain the specified substring. Returns results : list List of the matched files. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/file_io/find_files/","title":"API"},{"location":"user_guide/frequent_patterns/apriori/","text":"Frequent Itemsets via Apriori Algorithm Apriori function to extract frequent itemsets for association rule mining from mlxtend.frequent_patterns import apriori Overview Apriori is a popular algorithm [1] for extracting frequent itemsets with applications in association rule learning. The apriori algorithm has been designed to operate on databases containing transactions, such as purchases by customers of a store. An itemset is considered as \"frequent\" if it meets a user-specified support threshold. For instance, if the support threshold is set to 0.5 (50%), a frequent itemset is defined as a set of items that occur together in at least 50% of all transactions in the database. References [1] Agrawal, Rakesh, and Ramakrishnan Srikant. \" Fast algorithms for mining association rules .\" Proc. 20th int. conf. very large data bases, VLDB. Vol. 1215. 1994. Example 1 -- Generating Frequent Itemsets The apriori function expects data in a one-hot encoded pandas DataFrame. Suppose we have the following transaction data: dataset = [['Milk', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt'], ['Dill', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt'], ['Milk', 'Apple', 'Kidney Beans', 'Eggs'], ['Milk', 'Unicorn', 'Corn', 'Kidney Beans', 'Yogurt'], ['Corn', 'Onion', 'Onion', 'Kidney Beans', 'Ice cream', 'Eggs']] We can transform it into the right format via the TransactionEncoder as follows: import pandas as pd from mlxtend.preprocessing import TransactionEncoder te = TransactionEncoder() te_ary = te.fit(dataset).transform(dataset) df = pd.DataFrame(te_ary, columns=te.columns_) df .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } Apple Corn Dill Eggs Ice cream Kidney Beans Milk Nutmeg Onion Unicorn Yogurt 0 False False False True False True True True True False True 1 False False True True False True False True True False True 2 True False False True False True True False False False False 3 False True False False False True True False False True True 4 False True False True True True False False True False False Now, let us return the items and itemsets with at least 60% support: from mlxtend.frequent_patterns import apriori apriori(df, min_support=0.6) .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } support itemsets 0 0.8 (3) 1 1.0 (5) 2 0.6 (6) 3 0.6 (8) 4 0.6 (10) 5 0.8 (3, 5) 6 0.6 (8, 3) 7 0.6 (5, 6) 8 0.6 (8, 5) 9 0.6 (10, 5) 10 0.6 (8, 3, 5) By default, apriori returns the column indices of the items, which may be useful in downstream operations such as association rule mining. For better readability, we can set use_colnames=True to convert these integer values into the respective item names: apriori(df, min_support=0.6, use_colnames=True) .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } support itemsets 0 0.8 (Eggs) 1 1.0 (Kidney Beans) 2 0.6 (Milk) 3 0.6 (Onion) 4 0.6 (Yogurt) 5 0.8 (Eggs, Kidney Beans) 6 0.6 (Onion, Eggs) 7 0.6 (Milk, Kidney Beans) 8 0.6 (Onion, Kidney Beans) 9 0.6 (Kidney Beans, Yogurt) 10 0.6 (Onion, Eggs, Kidney Beans) Example 2 -- Selecting and Filtering Results The advantage of working with pandas DataFrames is that we can use its convenient features to filter the results. For instance, let's assume we are only interested in itemsets of length 2 that have a support of at least 80 percent. First, we create the frequent itemsets via apriori and add a new column that stores the length of each itemset: frequent_itemsets = apriori(df, min_support=0.6, use_colnames=True) frequent_itemsets['length'] = frequent_itemsets['itemsets'].apply(lambda x: len(x)) frequent_itemsets .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } support itemsets length 0 0.8 (Eggs) 1 1 1.0 (Kidney Beans) 1 2 0.6 (Milk) 1 3 0.6 (Onion) 1 4 0.6 (Yogurt) 1 5 0.8 (Eggs, Kidney Beans) 2 6 0.6 (Onion, Eggs) 2 7 0.6 (Milk, Kidney Beans) 2 8 0.6 (Onion, Kidney Beans) 2 9 0.6 (Kidney Beans, Yogurt) 2 10 0.6 (Onion, Eggs, Kidney Beans) 3 Then, we can select the results that satisfy our desired criteria as follows: frequent_itemsets[ (frequent_itemsets['length'] == 2) & (frequent_itemsets['support'] >= 0.8) ] .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } support itemsets length 5 0.8 (Eggs, Kidney Beans) 2 Similarly, using the Pandas API, we can select entries based on the \"itemsets\" column: frequent_itemsets[ frequent_itemsets['itemsets'] == {'Onion', 'Eggs'} ] .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } support itemsets length 6 0.6 (Onion, Eggs) 2 Frozensets Note that the entries in the \"itemsets\" column are of type frozenset , which is built-in Python type that is similar to a Python set but immutable, which makes it more efficient for certain query or comparison operations (https://docs.python.org/3.6/library/stdtypes.html#frozenset). Since frozenset s are sets, the item order does not matter. I.e., the query frequent_itemsets[ frequent_itemsets['itemsets'] == {'Onion', 'Eggs'} ] is equivalent to any of the following three frequent_itemsets[ frequent_itemsets['itemsets'] == {'Eggs', 'Onion'} ] frequent_itemsets[ frequent_itemsets['itemsets'] == frozenset(('Eggs', 'Onion')) ] frequent_itemsets[ frequent_itemsets['itemsets'] == frozenset(('Onion', 'Eggs')) ] Example 3 -- Working with Sparse Representations To save memory, you may want to represent your transaction data in the sparse format. This is especially useful if you have lots of products and small transactions. oht_ary = te.fit(dataset).transform(dataset, sparse=True) sparse_df = pd.SparseDataFrame(te_ary, columns=te.columns_, default_fill_value=False) sparse_df .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } Apple Corn Dill Eggs Ice cream Kidney Beans Milk Nutmeg Onion Unicorn Yogurt 0 False False False True False True True True True False True 1 False False True True False True False True True False True 2 True False False True False True True False False False False 3 False True False False False True True False False True True 4 False True False True True True False False True False False apriori(sparse_df, min_support=0.6, use_colnames=True) .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } support itemsets 0 0.8 (Eggs) 1 1.0 (Kidney Beans) 2 0.6 (Milk) 3 0.6 (Onion) 4 0.6 (Yogurt) 5 0.8 (Eggs, Kidney Beans) 6 0.6 (Onion, Eggs) 7 0.6 (Milk, Kidney Beans) 8 0.6 (Onion, Kidney Beans) 9 0.6 (Kidney Beans, Yogurt) 10 0.6 (Onion, Eggs, Kidney Beans) API apriori(df, min_support=0.5, use_colnames=False, max_len=None, n_jobs=1) Get frequent itemsets from a one-hot DataFrame Parameters df : pandas DataFrame or pandas SparseDataFrame pandas DataFrame the encoded format. The allowed values are either 0/1 or True/False. For example, Apple Bananas Beer Chicken Milk Rice 0 1 0 1 1 0 1 1 1 0 1 0 0 1 2 1 0 1 0 0 0 3 1 1 0 0 0 0 4 0 0 1 1 1 1 5 0 0 1 0 1 1 6 0 0 1 0 1 0 7 1 1 0 0 0 0 min_support : float (default: 0.5) A float between 0 and 1 for minumum support of the itemsets returned. The support is computed as the fraction transactions_where_item(s)_occur / total_transactions. use_colnames : bool (default: False) If true, uses the DataFrames' column names in the returned DataFrame instead of column indices. max_len : int (default: None) Maximum length of the itemsets generated. If None (default) all possible itemsets lengths (under the apriori condition) are evaluated. Returns pandas DataFrame with columns ['support', 'itemsets'] of all itemsets that are >= min_support and < than max_len (if max_len is not None). Each itemset in the 'itemsets' column is of type frozenset , which is a Python built-in type that behaves similarly to sets except that it is immutable (For more info, see https://docs.python.org/3.6/library/stdtypes.html#frozenset). Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/frequent_patterns/apriori/","title":"Apriori"},{"location":"user_guide/frequent_patterns/apriori/#frequent-itemsets-via-apriori-algorithm","text":"Apriori function to extract frequent itemsets for association rule mining from mlxtend.frequent_patterns import apriori","title":"Frequent Itemsets via Apriori Algorithm"},{"location":"user_guide/frequent_patterns/apriori/#overview","text":"Apriori is a popular algorithm [1] for extracting frequent itemsets with applications in association rule learning. The apriori algorithm has been designed to operate on databases containing transactions, such as purchases by customers of a store. An itemset is considered as \"frequent\" if it meets a user-specified support threshold. For instance, if the support threshold is set to 0.5 (50%), a frequent itemset is defined as a set of items that occur together in at least 50% of all transactions in the database.","title":"Overview"},{"location":"user_guide/frequent_patterns/apriori/#references","text":"[1] Agrawal, Rakesh, and Ramakrishnan Srikant. \" Fast algorithms for mining association rules .\" Proc. 20th int. conf. very large data bases, VLDB. Vol. 1215. 1994.","title":"References"},{"location":"user_guide/frequent_patterns/apriori/#example-1-generating-frequent-itemsets","text":"The apriori function expects data in a one-hot encoded pandas DataFrame. Suppose we have the following transaction data: dataset = [['Milk', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt'], ['Dill', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt'], ['Milk', 'Apple', 'Kidney Beans', 'Eggs'], ['Milk', 'Unicorn', 'Corn', 'Kidney Beans', 'Yogurt'], ['Corn', 'Onion', 'Onion', 'Kidney Beans', 'Ice cream', 'Eggs']] We can transform it into the right format via the TransactionEncoder as follows: import pandas as pd from mlxtend.preprocessing import TransactionEncoder te = TransactionEncoder() te_ary = te.fit(dataset).transform(dataset) df = pd.DataFrame(te_ary, columns=te.columns_) df .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } Apple Corn Dill Eggs Ice cream Kidney Beans Milk Nutmeg Onion Unicorn Yogurt 0 False False False True False True True True True False True 1 False False True True False True False True True False True 2 True False False True False True True False False False False 3 False True False False False True True False False True True 4 False True False True True True False False True False False Now, let us return the items and itemsets with at least 60% support: from mlxtend.frequent_patterns import apriori apriori(df, min_support=0.6) .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } support itemsets 0 0.8 (3) 1 1.0 (5) 2 0.6 (6) 3 0.6 (8) 4 0.6 (10) 5 0.8 (3, 5) 6 0.6 (8, 3) 7 0.6 (5, 6) 8 0.6 (8, 5) 9 0.6 (10, 5) 10 0.6 (8, 3, 5) By default, apriori returns the column indices of the items, which may be useful in downstream operations such as association rule mining. For better readability, we can set use_colnames=True to convert these integer values into the respective item names: apriori(df, min_support=0.6, use_colnames=True) .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } support itemsets 0 0.8 (Eggs) 1 1.0 (Kidney Beans) 2 0.6 (Milk) 3 0.6 (Onion) 4 0.6 (Yogurt) 5 0.8 (Eggs, Kidney Beans) 6 0.6 (Onion, Eggs) 7 0.6 (Milk, Kidney Beans) 8 0.6 (Onion, Kidney Beans) 9 0.6 (Kidney Beans, Yogurt) 10 0.6 (Onion, Eggs, Kidney Beans)","title":"Example 1 -- Generating Frequent Itemsets"},{"location":"user_guide/frequent_patterns/apriori/#example-2-selecting-and-filtering-results","text":"The advantage of working with pandas DataFrames is that we can use its convenient features to filter the results. For instance, let's assume we are only interested in itemsets of length 2 that have a support of at least 80 percent. First, we create the frequent itemsets via apriori and add a new column that stores the length of each itemset: frequent_itemsets = apriori(df, min_support=0.6, use_colnames=True) frequent_itemsets['length'] = frequent_itemsets['itemsets'].apply(lambda x: len(x)) frequent_itemsets .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } support itemsets length 0 0.8 (Eggs) 1 1 1.0 (Kidney Beans) 1 2 0.6 (Milk) 1 3 0.6 (Onion) 1 4 0.6 (Yogurt) 1 5 0.8 (Eggs, Kidney Beans) 2 6 0.6 (Onion, Eggs) 2 7 0.6 (Milk, Kidney Beans) 2 8 0.6 (Onion, Kidney Beans) 2 9 0.6 (Kidney Beans, Yogurt) 2 10 0.6 (Onion, Eggs, Kidney Beans) 3 Then, we can select the results that satisfy our desired criteria as follows: frequent_itemsets[ (frequent_itemsets['length'] == 2) & (frequent_itemsets['support'] >= 0.8) ] .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } support itemsets length 5 0.8 (Eggs, Kidney Beans) 2 Similarly, using the Pandas API, we can select entries based on the \"itemsets\" column: frequent_itemsets[ frequent_itemsets['itemsets'] == {'Onion', 'Eggs'} ] .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } support itemsets length 6 0.6 (Onion, Eggs) 2 Frozensets Note that the entries in the \"itemsets\" column are of type frozenset , which is built-in Python type that is similar to a Python set but immutable, which makes it more efficient for certain query or comparison operations (https://docs.python.org/3.6/library/stdtypes.html#frozenset). Since frozenset s are sets, the item order does not matter. I.e., the query frequent_itemsets[ frequent_itemsets['itemsets'] == {'Onion', 'Eggs'} ] is equivalent to any of the following three frequent_itemsets[ frequent_itemsets['itemsets'] == {'Eggs', 'Onion'} ] frequent_itemsets[ frequent_itemsets['itemsets'] == frozenset(('Eggs', 'Onion')) ] frequent_itemsets[ frequent_itemsets['itemsets'] == frozenset(('Onion', 'Eggs')) ]","title":"Example 2 -- Selecting and Filtering Results"},{"location":"user_guide/frequent_patterns/apriori/#example-3-working-with-sparse-representations","text":"To save memory, you may want to represent your transaction data in the sparse format. This is especially useful if you have lots of products and small transactions. oht_ary = te.fit(dataset).transform(dataset, sparse=True) sparse_df = pd.SparseDataFrame(te_ary, columns=te.columns_, default_fill_value=False) sparse_df .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } Apple Corn Dill Eggs Ice cream Kidney Beans Milk Nutmeg Onion Unicorn Yogurt 0 False False False True False True True True True False True 1 False False True True False True False True True False True 2 True False False True False True True False False False False 3 False True False False False True True False False True True 4 False True False True True True False False True False False apriori(sparse_df, min_support=0.6, use_colnames=True) .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } support itemsets 0 0.8 (Eggs) 1 1.0 (Kidney Beans) 2 0.6 (Milk) 3 0.6 (Onion) 4 0.6 (Yogurt) 5 0.8 (Eggs, Kidney Beans) 6 0.6 (Onion, Eggs) 7 0.6 (Milk, Kidney Beans) 8 0.6 (Onion, Kidney Beans) 9 0.6 (Kidney Beans, Yogurt) 10 0.6 (Onion, Eggs, Kidney Beans)","title":"Example 3 -- Working with Sparse Representations"},{"location":"user_guide/frequent_patterns/apriori/#api","text":"apriori(df, min_support=0.5, use_colnames=False, max_len=None, n_jobs=1) Get frequent itemsets from a one-hot DataFrame Parameters df : pandas DataFrame or pandas SparseDataFrame pandas DataFrame the encoded format. The allowed values are either 0/1 or True/False. For example, Apple Bananas Beer Chicken Milk Rice 0 1 0 1 1 0 1 1 1 0 1 0 0 1 2 1 0 1 0 0 0 3 1 1 0 0 0 0 4 0 0 1 1 1 1 5 0 0 1 0 1 1 6 0 0 1 0 1 0 7 1 1 0 0 0 0 min_support : float (default: 0.5) A float between 0 and 1 for minumum support of the itemsets returned. The support is computed as the fraction transactions_where_item(s)_occur / total_transactions. use_colnames : bool (default: False) If true, uses the DataFrames' column names in the returned DataFrame instead of column indices. max_len : int (default: None) Maximum length of the itemsets generated. If None (default) all possible itemsets lengths (under the apriori condition) are evaluated. Returns pandas DataFrame with columns ['support', 'itemsets'] of all itemsets that are >= min_support and < than max_len (if max_len is not None). Each itemset in the 'itemsets' column is of type frozenset , which is a Python built-in type that behaves similarly to sets except that it is immutable (For more info, see https://docs.python.org/3.6/library/stdtypes.html#frozenset). Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/frequent_patterns/apriori/","title":"API"},{"location":"user_guide/frequent_patterns/association_rules/","text":"Association Rules Generation from Frequent Itemsets Function to generate association rules from frequent itemsets from mlxtend.frequent_patterns import association_rules Overview Rule generation is a common task in the mining of frequent patterns. An association rule is an implication expression of the form X \\rightarrow Y , where X and Y are disjoint itemsets [1]. A more concrete example based on consumer behaviour would be \\{Diapers\\} \\rightarrow \\{Beer\\} suggesting that people who buy diapers are also likely to buy beer. To evaluate the \"interest\" of such an association rule, different metrics have been developed. The current implementation make use of the confidence and lift metrics. Metrics The currently supported metrics for evaluating association rules and setting selection thresholds are listed below. Given a rule \"A -> C\", A stands for antecedent and C stands for consequent. 'support': \\text{support}(A\\rightarrow C) = \\text{support}(A \\cup C), \\;\\;\\; \\text{range: } [0, 1] introduced in [3] The support metric is defined for itemsets, not assocication rules. The table produced by the association rule mining algorithm contains three different support metrics: 'antecedent support', 'consequent support', and 'support'. Here, 'antecedent support' computes the proportion of transactions that contain the antecedent A, and 'consequent support' computes the support for the itemset of the consequent C. The 'support' metric then computes the support of the combined itemset A \\cup C -- note that 'support' depends on 'antecedent support' and 'consequent support' via min('antecedent support', 'consequent support'). Typically, support is used to measure the abundance or frequency (often interpreted as significance or importance) of an itemset in a database. We refer to an itemset as a \"frequent itemset\" if you support is larger than a specified minimum-support threshold. Note that in general, due to the downward closure property, all subsets of a frequent itemset are also frequent. 'confidence': \\text{confidence}(A\\rightarrow C) = \\frac{\\text{support}(A\\rightarrow C)}{\\text{support}(A)}, \\;\\;\\; \\text{range: } [0, 1] introduced in [3] The confidence of a rule A->C is the probability of seeing the consequent in a transaction given that it also contains the antecedent. Note that the metric is not symmetric or directed; for instance, the confidence for A->C is different than the confidence for C->A. The confidence is 1 (maximal) for a rule A->C if the consequent and antecedent always occur together. 'lift': \\text{lift}(A\\rightarrow C) = \\frac{\\text{confidence}(A\\rightarrow C)}{\\text{support}(C)}, \\;\\;\\; \\text{range: } [0, \\infty] introduced in [4] The lift metric is commonly used to measure how much more often the antecedent and consequent of a rule A->C occur together than we would expect if they were statistically independent. If A and C are independent, the Lift score will be exactly 1. 'leverage': \\text{levarage}(A\\rightarrow C) = \\text{support}(A\\rightarrow C) - \\text{support}(A) \\times \\text{support}(C), \\;\\;\\; \\text{range: } [-1, 1] introduced in [5] Leverage computes the difference between the observed frequency of A and C appearing together and the frequency that would be expected if A and C were independent. An leverage value of 0 indicates independence. 'conviction': \\text{conviction}(A\\rightarrow C) = \\frac{1 - \\text{support}(C)}{1 - \\text{confidence}(A\\rightarrow C)}, \\;\\;\\; \\text{range: } [0, \\infty] introduced in [6] A high conviction value means that the consequent is highly depending on the antecedent. For instance, in the case of a perfect confidence score, the denominator becomes 0 (due to 1 - 1) for which the conviction score is defined as 'inf'. Similar to lift, if items are independent, the conviction is 1. References [1] Tan, Steinbach, Kumar. Introduction to Data Mining. Pearson New International Edition. Harlow: Pearson Education Ltd., 2014. (pp. 327-414). [2] Michael Hahsler, http://michael.hahsler.net/research/association_rules/measures.html [3] R. Agrawal, T. Imielinski, and A. Swami. Mining associations between sets of items in large databases. In Proc. of the ACM SIGMOD Int'l Conference on Management of Data, pages 207-216, Washington D.C., May 1993 [4] S. Brin, R. Motwani, J. D. Ullman, and S. Tsur. Dynamic itemset counting and implication rules for market basket data [5] Piatetsky-Shapiro, G., Discovery, analysis, and presentation of strong rules. Knowledge Discovery in Databases, 1991: p. 229-248. [6] Sergey Brin, Rajeev Motwani, Jeffrey D. Ullman, and Shalom Turk. Dynamic itemset counting and implication rules for market basket data. In SIGMOD 1997, Proceedings ACM SIGMOD International Conference on Management of Data, pages 255-264, Tucson, Arizona, USA, May 1997 Example 1 -- Generating Association Rules from Frequent Itemsets The generate_rules takes dataframes of frequent itemsets as produced by the apriori function in mlxtend.association . To demonstrate the usage of the generate_rules method, we first create a pandas DataFrame of frequent itemsets as generated by the apriori function: import pandas as pd from mlxtend.preprocessing import TransactionEncoder from mlxtend.frequent_patterns import apriori dataset = [['Milk', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt'], ['Dill', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt'], ['Milk', 'Apple', 'Kidney Beans', 'Eggs'], ['Milk', 'Unicorn', 'Corn', 'Kidney Beans', 'Yogurt'], ['Corn', 'Onion', 'Onion', 'Kidney Beans', 'Ice cream', 'Eggs']] te = TransactionEncoder() te_ary = te.fit(dataset).transform(dataset) df = pd.DataFrame(te_ary, columns=te.columns_) frequent_itemsets = apriori(df, min_support=0.6, use_colnames=True) frequent_itemsets .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } support itemsets 0 0.8 (Eggs) 1 1.0 (Kidney Beans) 2 0.6 (Milk) 3 0.6 (Onion) 4 0.6 (Yogurt) 5 0.8 (Kidney Beans, Eggs) 6 0.6 (Onion, Eggs) 7 0.6 (Milk, Kidney Beans) 8 0.6 (Onion, Kidney Beans) 9 0.6 (Kidney Beans, Yogurt) 10 0.6 (Onion, Kidney Beans, Eggs) The generate_rules() function allows you to (1) specify your metric of interest and (2) the according threshold. Currently implemented measures are confidence and lift . Let's say you are interesting in rules derived from the frequent itemsets only if the level of confidence is above the 90 percent threshold ( min_threshold=0.7 ): from mlxtend.frequent_patterns import association_rules association_rules(frequent_itemsets, metric=\"confidence\", min_threshold=0.7) .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } antecedents consequents antecedent support consequent support support confidence lift leverage conviction 0 (Kidney Beans) (Eggs) 1.0 0.8 0.8 0.80 1.00 0.00 1.000000 1 (Eggs) (Kidney Beans) 0.8 1.0 0.8 1.00 1.00 0.00 inf 2 (Onion) (Eggs) 0.6 0.8 0.6 1.00 1.25 0.12 inf 3 (Eggs) (Onion) 0.8 0.6 0.6 0.75 1.25 0.12 1.600000 4 (Milk) (Kidney Beans) 0.6 1.0 0.6 1.00 1.00 0.00 inf 5 (Onion) (Kidney Beans) 0.6 1.0 0.6 1.00 1.00 0.00 inf 6 (Yogurt) (Kidney Beans) 0.6 1.0 0.6 1.00 1.00 0.00 inf 7 (Onion, Kidney Beans) (Eggs) 0.6 0.8 0.6 1.00 1.25 0.12 inf 8 (Onion, Eggs) (Kidney Beans) 0.6 1.0 0.6 1.00 1.00 0.00 inf 9 (Kidney Beans, Eggs) (Onion) 0.8 0.6 0.6 0.75 1.25 0.12 1.600000 10 (Onion) (Kidney Beans, Eggs) 0.6 0.8 0.6 1.00 1.25 0.12 inf 11 (Eggs) (Onion, Kidney Beans) 0.8 0.6 0.6 0.75 1.25 0.12 1.600000 Example 2 -- Rule Generation and Selection Criteria If you are interested in rules according to a different metric of interest, you can simply adjust the metric and min_threshold arguments . E.g. if you are only interested in rules that have a lift score of >= 1.2, you would do the following: rules = association_rules(frequent_itemsets, metric=\"lift\", min_threshold=1.2) rules .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } antecedents consequents antecedent support consequent support support confidence lift leverage conviction 0 (Onion) (Eggs) 0.6 0.8 0.6 1.00 1.25 0.12 inf 1 (Eggs) (Onion) 0.8 0.6 0.6 0.75 1.25 0.12 1.600000 2 (Onion, Kidney Beans) (Eggs) 0.6 0.8 0.6 1.00 1.25 0.12 inf 3 (Kidney Beans, Eggs) (Onion) 0.8 0.6 0.6 0.75 1.25 0.12 1.600000 4 (Onion) (Kidney Beans, Eggs) 0.6 0.8 0.6 1.00 1.25 0.12 inf 5 (Eggs) (Onion, Kidney Beans) 0.8 0.6 0.6 0.75 1.25 0.12 1.600000 Pandas DataFrames make it easy to filter the results further. Let's say we are ony interested in rules that satisfy the following criteria: at least 2 antecedents a confidence > 0.75 a lift score > 1.2 We could compute the antecedent length as follows: rules[\"antecedent_len\"] = rules[\"antecedents\"].apply(lambda x: len(x)) rules .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } antecedents consequents antecedent support consequent support support confidence lift leverage conviction antecedent_len 0 (Onion) (Eggs) 0.6 0.8 0.6 1.00 1.25 0.12 inf 1 1 (Eggs) (Onion) 0.8 0.6 0.6 0.75 1.25 0.12 1.600000 1 2 (Onion, Kidney Beans) (Eggs) 0.6 0.8 0.6 1.00 1.25 0.12 inf 2 3 (Kidney Beans, Eggs) (Onion) 0.8 0.6 0.6 0.75 1.25 0.12 1.600000 2 4 (Onion) (Kidney Beans, Eggs) 0.6 0.8 0.6 1.00 1.25 0.12 inf 1 5 (Eggs) (Onion, Kidney Beans) 0.8 0.6 0.6 0.75 1.25 0.12 1.600000 1 Then, we can use pandas' selection syntax as shown below: rules[ (rules['antecedent_len'] >= 2) & (rules['confidence'] > 0.75) & (rules['lift'] > 1.2) ] .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } antecedents consequents antecedent support consequent support support confidence lift leverage conviction antecedent_len 2 (Onion, Kidney Beans) (Eggs) 0.6 0.8 0.6 1.0 1.25 0.12 inf 2 Similarly, using the Pandas API, we can select entries based on the \"antecedents\" or \"consequents\" columns: rules[rules['antecedents'] == {'Eggs', 'Kidney Beans'}] .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } antecedents consequents antecedent support consequent support support confidence lift leverage conviction antecedent_len 3 (Kidney Beans, Eggs) (Onion) 0.8 0.6 0.6 0.75 1.25 0.12 1.6 2 Frozensets Note that the entries in the \"itemsets\" column are of type frozenset , which is built-in Python type that is similar to a Python set but immutable, which makes it more efficient for certain query or comparison operations (https://docs.python.org/3.6/library/stdtypes.html#frozenset). Since frozenset s are sets, the item order does not matter. I.e., the query rules[rules['antecedents'] == {'Eggs', 'Kidney Beans'}] is equivalent to any of the following three rules[rules['antecedents'] == {'Kidney Beans', 'Eggs'}] rules[rules['antecedents'] == frozenset(('Eggs', 'Kidney Beans'))] rules[rules['antecedents'] == frozenset(('Kidney Beans', 'Eggs'))] Example 3 -- Frequent Itemsets with Incomplete Antecedent and Consequent Information Most metrics computed by association_rules depends on the consequent and antecedent support score of a given rule provided in the frequent itemset input DataFrame. Consider the following example: import pandas as pd dict = {'itemsets': [['177', '176'], ['177', '179'], ['176', '178'], ['176', '179'], ['93', '100'], ['177', '178'], ['177', '176', '178']], 'support':[0.253623, 0.253623, 0.217391, 0.217391, 0.181159, 0.108696, 0.108696]} freq_itemsets = pd.DataFrame(dict) freq_itemsets .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } itemsets support 0 [177, 176] 0.253623 1 [177, 179] 0.253623 2 [176, 178] 0.217391 3 [176, 179] 0.217391 4 [93, 100] 0.181159 5 [177, 178] 0.108696 6 [177, 176, 178] 0.108696 Note that this is a \"cropped\" DataFrame that doesn't contain the support values of the item subsets. This can create problems if we want to compute the association rule metrics for, e.g., 176 => 177 . For example, the confidence is computed as \\text{confidence}(A\\rightarrow C) = \\frac{\\text{support}(A\\rightarrow C)}{\\text{support}(A)}, \\;\\;\\; \\text{range: } [0, 1] But we do not have \\text{support}(A) . All we know about \"A\"'s support is that it is at least 0.253623. In these scenarios, where not all metric's can be computed, due to incomplete input DataFrames, you can use the support_only=True option, which will only compute the support column of a given rule that does not require as much info: \\text{support}(A\\rightarrow C) = \\text{support}(A \\cup C), \\;\\;\\; \\text{range: } [0, 1] \"NaN's\" will be assigned to all other metric columns: from mlxtend.frequent_patterns import association_rules res = association_rules(freq_itemsets, support_only=True, min_threshold=0.1) res .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } antecedents consequents antecedent support consequent support support confidence lift leverage conviction 0 (176) (177) NaN NaN 0.253623 NaN NaN NaN NaN 1 (177) (176) NaN NaN 0.253623 NaN NaN NaN NaN 2 (179) (177) NaN NaN 0.253623 NaN NaN NaN NaN 3 (177) (179) NaN NaN 0.253623 NaN NaN NaN NaN 4 (176) (178) NaN NaN 0.217391 NaN NaN NaN NaN 5 (178) (176) NaN NaN 0.217391 NaN NaN NaN NaN 6 (179) (176) NaN NaN 0.217391 NaN NaN NaN NaN 7 (176) (179) NaN NaN 0.217391 NaN NaN NaN NaN 8 (93) (100) NaN NaN 0.181159 NaN NaN NaN NaN 9 (100) (93) NaN NaN 0.181159 NaN NaN NaN NaN 10 (177) (178) NaN NaN 0.108696 NaN NaN NaN NaN 11 (178) (177) NaN NaN 0.108696 NaN NaN NaN NaN 12 (176, 177) (178) NaN NaN 0.108696 NaN NaN NaN NaN 13 (176, 178) (177) NaN NaN 0.108696 NaN NaN NaN NaN 14 (177, 178) (176) NaN NaN 0.108696 NaN NaN NaN NaN 15 (176) (177, 178) NaN NaN 0.108696 NaN NaN NaN NaN 16 (177) (176, 178) NaN NaN 0.108696 NaN NaN NaN NaN 17 (178) (176, 177) NaN NaN 0.108696 NaN NaN NaN NaN To clean up the representation, you may want to do the following: res = res[['antecedents', 'consequents', 'support']] res .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } antecedents consequents support 0 (176) (177) 0.253623 1 (177) (176) 0.253623 2 (179) (177) 0.253623 3 (177) (179) 0.253623 4 (176) (178) 0.217391 5 (178) (176) 0.217391 6 (179) (176) 0.217391 7 (176) (179) 0.217391 8 (93) (100) 0.181159 9 (100) (93) 0.181159 10 (177) (178) 0.108696 11 (178) (177) 0.108696 12 (176, 177) (178) 0.108696 13 (176, 178) (177) 0.108696 14 (177, 178) (176) 0.108696 15 (176) (177, 178) 0.108696 16 (177) (176, 178) 0.108696 17 (178) (176, 177) 0.108696 API association_rules(df, metric='confidence', min_threshold=0.8, support_only=False) Generates a DataFrame of association rules including the metrics 'score', 'confidence', and 'lift' Parameters df : pandas DataFrame pandas DataFrame of frequent itemsets with columns ['support', 'itemsets'] metric : string (default: 'confidence') Metric to evaluate if a rule is of interest. Automatically set to 'support' if support_only=True . Otherwise, supported metrics are 'support', 'confidence', 'lift', 'leverage', and 'conviction' These metrics are computed as follows: - support(A->C) = support(A+C) [aka 'support'], range: [0, 1] - confidence(A->C) = support(A+C) / support(A), range: [0, 1] - lift(A->C) = confidence(A->C) / support(C), range: [0, inf] - leverage(A->C) = support(A->C) - support(A)*support(C), range: [-1, 1] - conviction = [1 - support(C)] / [1 - confidence(A->C)], range: [0, inf] min_threshold : float (default: 0.8) Minimal threshold for the evaluation metric, via the metric parameter, to decide whether a candidate rule is of interest. support_only : bool (default: False) Only computes the rule support and fills the other metric columns with NaNs. This is useful if: a) the input DataFrame is incomplete, e.g., does not contain support values for all rule antecedents and consequents b) you simply want to speed up the computation because you don't need the other metrics. Returns pandas DataFrame with columns \"antecedents\" and \"consequents\" that store itemsets, plus the scoring metric columns: \"antecedent support\", \"consequent support\", \"support\", \"confidence\", \"lift\", \"leverage\", \"conviction\" of all rules for which metric(rule) >= min_threshold. Each entry in the \"antecedents\" and \"consequents\" columns are of type frozenset , which is a Python built-in type that behaves similarly to sets except that it is immutable (For more info, see https://docs.python.org/3.6/library/stdtypes.html#frozenset). Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/frequent_patterns/association_rules/","title":"Association rules"},{"location":"user_guide/frequent_patterns/association_rules/#association-rules-generation-from-frequent-itemsets","text":"Function to generate association rules from frequent itemsets from mlxtend.frequent_patterns import association_rules","title":"Association Rules Generation from Frequent Itemsets"},{"location":"user_guide/frequent_patterns/association_rules/#overview","text":"Rule generation is a common task in the mining of frequent patterns. An association rule is an implication expression of the form X \\rightarrow Y , where X and Y are disjoint itemsets [1]. A more concrete example based on consumer behaviour would be \\{Diapers\\} \\rightarrow \\{Beer\\} suggesting that people who buy diapers are also likely to buy beer. To evaluate the \"interest\" of such an association rule, different metrics have been developed. The current implementation make use of the confidence and lift metrics.","title":"Overview"},{"location":"user_guide/frequent_patterns/association_rules/#metrics","text":"The currently supported metrics for evaluating association rules and setting selection thresholds are listed below. Given a rule \"A -> C\", A stands for antecedent and C stands for consequent.","title":"Metrics"},{"location":"user_guide/frequent_patterns/association_rules/#support","text":"\\text{support}(A\\rightarrow C) = \\text{support}(A \\cup C), \\;\\;\\; \\text{range: } [0, 1] introduced in [3] The support metric is defined for itemsets, not assocication rules. The table produced by the association rule mining algorithm contains three different support metrics: 'antecedent support', 'consequent support', and 'support'. Here, 'antecedent support' computes the proportion of transactions that contain the antecedent A, and 'consequent support' computes the support for the itemset of the consequent C. The 'support' metric then computes the support of the combined itemset A \\cup C -- note that 'support' depends on 'antecedent support' and 'consequent support' via min('antecedent support', 'consequent support'). Typically, support is used to measure the abundance or frequency (often interpreted as significance or importance) of an itemset in a database. We refer to an itemset as a \"frequent itemset\" if you support is larger than a specified minimum-support threshold. Note that in general, due to the downward closure property, all subsets of a frequent itemset are also frequent.","title":"'support':"},{"location":"user_guide/frequent_patterns/association_rules/#confidence","text":"\\text{confidence}(A\\rightarrow C) = \\frac{\\text{support}(A\\rightarrow C)}{\\text{support}(A)}, \\;\\;\\; \\text{range: } [0, 1] introduced in [3] The confidence of a rule A->C is the probability of seeing the consequent in a transaction given that it also contains the antecedent. Note that the metric is not symmetric or directed; for instance, the confidence for A->C is different than the confidence for C->A. The confidence is 1 (maximal) for a rule A->C if the consequent and antecedent always occur together.","title":"'confidence':"},{"location":"user_guide/frequent_patterns/association_rules/#lift","text":"\\text{lift}(A\\rightarrow C) = \\frac{\\text{confidence}(A\\rightarrow C)}{\\text{support}(C)}, \\;\\;\\; \\text{range: } [0, \\infty] introduced in [4] The lift metric is commonly used to measure how much more often the antecedent and consequent of a rule A->C occur together than we would expect if they were statistically independent. If A and C are independent, the Lift score will be exactly 1.","title":"'lift':"},{"location":"user_guide/frequent_patterns/association_rules/#leverage","text":"\\text{levarage}(A\\rightarrow C) = \\text{support}(A\\rightarrow C) - \\text{support}(A) \\times \\text{support}(C), \\;\\;\\; \\text{range: } [-1, 1] introduced in [5] Leverage computes the difference between the observed frequency of A and C appearing together and the frequency that would be expected if A and C were independent. An leverage value of 0 indicates independence.","title":"'leverage':"},{"location":"user_guide/frequent_patterns/association_rules/#conviction","text":"\\text{conviction}(A\\rightarrow C) = \\frac{1 - \\text{support}(C)}{1 - \\text{confidence}(A\\rightarrow C)}, \\;\\;\\; \\text{range: } [0, \\infty] introduced in [6] A high conviction value means that the consequent is highly depending on the antecedent. For instance, in the case of a perfect confidence score, the denominator becomes 0 (due to 1 - 1) for which the conviction score is defined as 'inf'. Similar to lift, if items are independent, the conviction is 1.","title":"'conviction':"},{"location":"user_guide/frequent_patterns/association_rules/#references","text":"[1] Tan, Steinbach, Kumar. Introduction to Data Mining. Pearson New International Edition. Harlow: Pearson Education Ltd., 2014. (pp. 327-414). [2] Michael Hahsler, http://michael.hahsler.net/research/association_rules/measures.html [3] R. Agrawal, T. Imielinski, and A. Swami. Mining associations between sets of items in large databases. In Proc. of the ACM SIGMOD Int'l Conference on Management of Data, pages 207-216, Washington D.C., May 1993 [4] S. Brin, R. Motwani, J. D. Ullman, and S. Tsur. Dynamic itemset counting and implication rules for market basket data [5] Piatetsky-Shapiro, G., Discovery, analysis, and presentation of strong rules. Knowledge Discovery in Databases, 1991: p. 229-248. [6] Sergey Brin, Rajeev Motwani, Jeffrey D. Ullman, and Shalom Turk. Dynamic itemset counting and implication rules for market basket data. In SIGMOD 1997, Proceedings ACM SIGMOD International Conference on Management of Data, pages 255-264, Tucson, Arizona, USA, May 1997","title":"References"},{"location":"user_guide/frequent_patterns/association_rules/#example-1-generating-association-rules-from-frequent-itemsets","text":"The generate_rules takes dataframes of frequent itemsets as produced by the apriori function in mlxtend.association . To demonstrate the usage of the generate_rules method, we first create a pandas DataFrame of frequent itemsets as generated by the apriori function: import pandas as pd from mlxtend.preprocessing import TransactionEncoder from mlxtend.frequent_patterns import apriori dataset = [['Milk', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt'], ['Dill', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt'], ['Milk', 'Apple', 'Kidney Beans', 'Eggs'], ['Milk', 'Unicorn', 'Corn', 'Kidney Beans', 'Yogurt'], ['Corn', 'Onion', 'Onion', 'Kidney Beans', 'Ice cream', 'Eggs']] te = TransactionEncoder() te_ary = te.fit(dataset).transform(dataset) df = pd.DataFrame(te_ary, columns=te.columns_) frequent_itemsets = apriori(df, min_support=0.6, use_colnames=True) frequent_itemsets .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } support itemsets 0 0.8 (Eggs) 1 1.0 (Kidney Beans) 2 0.6 (Milk) 3 0.6 (Onion) 4 0.6 (Yogurt) 5 0.8 (Kidney Beans, Eggs) 6 0.6 (Onion, Eggs) 7 0.6 (Milk, Kidney Beans) 8 0.6 (Onion, Kidney Beans) 9 0.6 (Kidney Beans, Yogurt) 10 0.6 (Onion, Kidney Beans, Eggs) The generate_rules() function allows you to (1) specify your metric of interest and (2) the according threshold. Currently implemented measures are confidence and lift . Let's say you are interesting in rules derived from the frequent itemsets only if the level of confidence is above the 90 percent threshold ( min_threshold=0.7 ): from mlxtend.frequent_patterns import association_rules association_rules(frequent_itemsets, metric=\"confidence\", min_threshold=0.7) .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } antecedents consequents antecedent support consequent support support confidence lift leverage conviction 0 (Kidney Beans) (Eggs) 1.0 0.8 0.8 0.80 1.00 0.00 1.000000 1 (Eggs) (Kidney Beans) 0.8 1.0 0.8 1.00 1.00 0.00 inf 2 (Onion) (Eggs) 0.6 0.8 0.6 1.00 1.25 0.12 inf 3 (Eggs) (Onion) 0.8 0.6 0.6 0.75 1.25 0.12 1.600000 4 (Milk) (Kidney Beans) 0.6 1.0 0.6 1.00 1.00 0.00 inf 5 (Onion) (Kidney Beans) 0.6 1.0 0.6 1.00 1.00 0.00 inf 6 (Yogurt) (Kidney Beans) 0.6 1.0 0.6 1.00 1.00 0.00 inf 7 (Onion, Kidney Beans) (Eggs) 0.6 0.8 0.6 1.00 1.25 0.12 inf 8 (Onion, Eggs) (Kidney Beans) 0.6 1.0 0.6 1.00 1.00 0.00 inf 9 (Kidney Beans, Eggs) (Onion) 0.8 0.6 0.6 0.75 1.25 0.12 1.600000 10 (Onion) (Kidney Beans, Eggs) 0.6 0.8 0.6 1.00 1.25 0.12 inf 11 (Eggs) (Onion, Kidney Beans) 0.8 0.6 0.6 0.75 1.25 0.12 1.600000","title":"Example 1 -- Generating Association Rules from Frequent Itemsets"},{"location":"user_guide/frequent_patterns/association_rules/#example-2-rule-generation-and-selection-criteria","text":"If you are interested in rules according to a different metric of interest, you can simply adjust the metric and min_threshold arguments . E.g. if you are only interested in rules that have a lift score of >= 1.2, you would do the following: rules = association_rules(frequent_itemsets, metric=\"lift\", min_threshold=1.2) rules .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } antecedents consequents antecedent support consequent support support confidence lift leverage conviction 0 (Onion) (Eggs) 0.6 0.8 0.6 1.00 1.25 0.12 inf 1 (Eggs) (Onion) 0.8 0.6 0.6 0.75 1.25 0.12 1.600000 2 (Onion, Kidney Beans) (Eggs) 0.6 0.8 0.6 1.00 1.25 0.12 inf 3 (Kidney Beans, Eggs) (Onion) 0.8 0.6 0.6 0.75 1.25 0.12 1.600000 4 (Onion) (Kidney Beans, Eggs) 0.6 0.8 0.6 1.00 1.25 0.12 inf 5 (Eggs) (Onion, Kidney Beans) 0.8 0.6 0.6 0.75 1.25 0.12 1.600000 Pandas DataFrames make it easy to filter the results further. Let's say we are ony interested in rules that satisfy the following criteria: at least 2 antecedents a confidence > 0.75 a lift score > 1.2 We could compute the antecedent length as follows: rules[\"antecedent_len\"] = rules[\"antecedents\"].apply(lambda x: len(x)) rules .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } antecedents consequents antecedent support consequent support support confidence lift leverage conviction antecedent_len 0 (Onion) (Eggs) 0.6 0.8 0.6 1.00 1.25 0.12 inf 1 1 (Eggs) (Onion) 0.8 0.6 0.6 0.75 1.25 0.12 1.600000 1 2 (Onion, Kidney Beans) (Eggs) 0.6 0.8 0.6 1.00 1.25 0.12 inf 2 3 (Kidney Beans, Eggs) (Onion) 0.8 0.6 0.6 0.75 1.25 0.12 1.600000 2 4 (Onion) (Kidney Beans, Eggs) 0.6 0.8 0.6 1.00 1.25 0.12 inf 1 5 (Eggs) (Onion, Kidney Beans) 0.8 0.6 0.6 0.75 1.25 0.12 1.600000 1 Then, we can use pandas' selection syntax as shown below: rules[ (rules['antecedent_len'] >= 2) & (rules['confidence'] > 0.75) & (rules['lift'] > 1.2) ] .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } antecedents consequents antecedent support consequent support support confidence lift leverage conviction antecedent_len 2 (Onion, Kidney Beans) (Eggs) 0.6 0.8 0.6 1.0 1.25 0.12 inf 2 Similarly, using the Pandas API, we can select entries based on the \"antecedents\" or \"consequents\" columns: rules[rules['antecedents'] == {'Eggs', 'Kidney Beans'}] .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } antecedents consequents antecedent support consequent support support confidence lift leverage conviction antecedent_len 3 (Kidney Beans, Eggs) (Onion) 0.8 0.6 0.6 0.75 1.25 0.12 1.6 2 Frozensets Note that the entries in the \"itemsets\" column are of type frozenset , which is built-in Python type that is similar to a Python set but immutable, which makes it more efficient for certain query or comparison operations (https://docs.python.org/3.6/library/stdtypes.html#frozenset). Since frozenset s are sets, the item order does not matter. I.e., the query rules[rules['antecedents'] == {'Eggs', 'Kidney Beans'}] is equivalent to any of the following three rules[rules['antecedents'] == {'Kidney Beans', 'Eggs'}] rules[rules['antecedents'] == frozenset(('Eggs', 'Kidney Beans'))] rules[rules['antecedents'] == frozenset(('Kidney Beans', 'Eggs'))]","title":"Example 2 -- Rule Generation and Selection Criteria"},{"location":"user_guide/frequent_patterns/association_rules/#example-3-frequent-itemsets-with-incomplete-antecedent-and-consequent-information","text":"Most metrics computed by association_rules depends on the consequent and antecedent support score of a given rule provided in the frequent itemset input DataFrame. Consider the following example: import pandas as pd dict = {'itemsets': [['177', '176'], ['177', '179'], ['176', '178'], ['176', '179'], ['93', '100'], ['177', '178'], ['177', '176', '178']], 'support':[0.253623, 0.253623, 0.217391, 0.217391, 0.181159, 0.108696, 0.108696]} freq_itemsets = pd.DataFrame(dict) freq_itemsets .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } itemsets support 0 [177, 176] 0.253623 1 [177, 179] 0.253623 2 [176, 178] 0.217391 3 [176, 179] 0.217391 4 [93, 100] 0.181159 5 [177, 178] 0.108696 6 [177, 176, 178] 0.108696 Note that this is a \"cropped\" DataFrame that doesn't contain the support values of the item subsets. This can create problems if we want to compute the association rule metrics for, e.g., 176 => 177 . For example, the confidence is computed as \\text{confidence}(A\\rightarrow C) = \\frac{\\text{support}(A\\rightarrow C)}{\\text{support}(A)}, \\;\\;\\; \\text{range: } [0, 1] But we do not have \\text{support}(A) . All we know about \"A\"'s support is that it is at least 0.253623. In these scenarios, where not all metric's can be computed, due to incomplete input DataFrames, you can use the support_only=True option, which will only compute the support column of a given rule that does not require as much info: \\text{support}(A\\rightarrow C) = \\text{support}(A \\cup C), \\;\\;\\; \\text{range: } [0, 1] \"NaN's\" will be assigned to all other metric columns: from mlxtend.frequent_patterns import association_rules res = association_rules(freq_itemsets, support_only=True, min_threshold=0.1) res .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } antecedents consequents antecedent support consequent support support confidence lift leverage conviction 0 (176) (177) NaN NaN 0.253623 NaN NaN NaN NaN 1 (177) (176) NaN NaN 0.253623 NaN NaN NaN NaN 2 (179) (177) NaN NaN 0.253623 NaN NaN NaN NaN 3 (177) (179) NaN NaN 0.253623 NaN NaN NaN NaN 4 (176) (178) NaN NaN 0.217391 NaN NaN NaN NaN 5 (178) (176) NaN NaN 0.217391 NaN NaN NaN NaN 6 (179) (176) NaN NaN 0.217391 NaN NaN NaN NaN 7 (176) (179) NaN NaN 0.217391 NaN NaN NaN NaN 8 (93) (100) NaN NaN 0.181159 NaN NaN NaN NaN 9 (100) (93) NaN NaN 0.181159 NaN NaN NaN NaN 10 (177) (178) NaN NaN 0.108696 NaN NaN NaN NaN 11 (178) (177) NaN NaN 0.108696 NaN NaN NaN NaN 12 (176, 177) (178) NaN NaN 0.108696 NaN NaN NaN NaN 13 (176, 178) (177) NaN NaN 0.108696 NaN NaN NaN NaN 14 (177, 178) (176) NaN NaN 0.108696 NaN NaN NaN NaN 15 (176) (177, 178) NaN NaN 0.108696 NaN NaN NaN NaN 16 (177) (176, 178) NaN NaN 0.108696 NaN NaN NaN NaN 17 (178) (176, 177) NaN NaN 0.108696 NaN NaN NaN NaN To clean up the representation, you may want to do the following: res = res[['antecedents', 'consequents', 'support']] res .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } antecedents consequents support 0 (176) (177) 0.253623 1 (177) (176) 0.253623 2 (179) (177) 0.253623 3 (177) (179) 0.253623 4 (176) (178) 0.217391 5 (178) (176) 0.217391 6 (179) (176) 0.217391 7 (176) (179) 0.217391 8 (93) (100) 0.181159 9 (100) (93) 0.181159 10 (177) (178) 0.108696 11 (178) (177) 0.108696 12 (176, 177) (178) 0.108696 13 (176, 178) (177) 0.108696 14 (177, 178) (176) 0.108696 15 (176) (177, 178) 0.108696 16 (177) (176, 178) 0.108696 17 (178) (176, 177) 0.108696","title":"Example 3 -- Frequent Itemsets with Incomplete Antecedent and Consequent Information"},{"location":"user_guide/frequent_patterns/association_rules/#api","text":"association_rules(df, metric='confidence', min_threshold=0.8, support_only=False) Generates a DataFrame of association rules including the metrics 'score', 'confidence', and 'lift' Parameters df : pandas DataFrame pandas DataFrame of frequent itemsets with columns ['support', 'itemsets'] metric : string (default: 'confidence') Metric to evaluate if a rule is of interest. Automatically set to 'support' if support_only=True . Otherwise, supported metrics are 'support', 'confidence', 'lift', 'leverage', and 'conviction' These metrics are computed as follows: - support(A->C) = support(A+C) [aka 'support'], range: [0, 1] - confidence(A->C) = support(A+C) / support(A), range: [0, 1] - lift(A->C) = confidence(A->C) / support(C), range: [0, inf] - leverage(A->C) = support(A->C) - support(A)*support(C), range: [-1, 1] - conviction = [1 - support(C)] / [1 - confidence(A->C)], range: [0, inf] min_threshold : float (default: 0.8) Minimal threshold for the evaluation metric, via the metric parameter, to decide whether a candidate rule is of interest. support_only : bool (default: False) Only computes the rule support and fills the other metric columns with NaNs. This is useful if: a) the input DataFrame is incomplete, e.g., does not contain support values for all rule antecedents and consequents b) you simply want to speed up the computation because you don't need the other metrics. Returns pandas DataFrame with columns \"antecedents\" and \"consequents\" that store itemsets, plus the scoring metric columns: \"antecedent support\", \"consequent support\", \"support\", \"confidence\", \"lift\", \"leverage\", \"conviction\" of all rules for which metric(rule) >= min_threshold. Each entry in the \"antecedents\" and \"consequents\" columns are of type frozenset , which is a Python built-in type that behaves similarly to sets except that it is immutable (For more info, see https://docs.python.org/3.6/library/stdtypes.html#frozenset). Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/frequent_patterns/association_rules/","title":"API"},{"location":"user_guide/general_concepts/activation-functions/","text":"Activation Functions for Artificial Neural Networks","title":"Activation Functions for Artificial Neural Networks"},{"location":"user_guide/general_concepts/activation-functions/#activation-functions-for-artificial-neural-networks","text":"","title":"Activation Functions for Artificial Neural Networks"},{"location":"user_guide/general_concepts/gradient-optimization/","text":"Gradient Descent and Stochastic Gradient Descent Gradient Descent (GD) Optimization Using the Gradient Decent optimization algorithm, the weights are updated incrementally after each epoch (= pass over the training dataset). Compatible cost functions J(\\cdot) Sum of squared errors (SSE) [ mlxtend.regressor.LinearRegression , mlxtend.classfier.Adaline ]: J(\\mathbf{w}) = \\frac{1}{2} \\sum_i (\\text{target}^{(i)} - \\text{output}^{(i)})^2 Logistic Cost (cross-entropy) [ mlxtend.classfier.LogisticRegression ]: ... The magnitude and direction of the weight update is computed by taking a step in the opposite direction of the cost gradient \\Delta w_j = -\\eta \\frac{\\partial J}{\\partial w_j}, where \\eta is the learning rate. The weights are then updated after each epoch via the following update rule: \\mathbf{w} := \\mathbf{w} + \\Delta\\mathbf{w}, where \\Delta\\mathbf{w} is a vector that contains the weight updates of each weight coefficient {w} , which are computed as follows: \\Delta w_j = -\\eta \\frac{\\partial J}{\\partial w_j}\\\\ = -\\eta \\sum_i (\\text{target}^{(i)} - \\text{output}^{(i)})(-x_{j}^{(i)})\\\\ = \\eta \\sum_i (\\text{target}^{(i)} - \\text{output}^{(i)})x_{j}^{(i)}. Essentially, we can picture Gradient Descent optimization as a hiker (the weight coefficient) who wants to climb down a mountain (cost function) into valley (cost minimum), and each step is determined by the steepness of the slope (gradient) and the leg length of the hiker (learning rate). Considering a cost function with only a single weight coefficient, we can illustrate this concept as follows: Stochastic Gradient Descent (SGD) In Gradient Descent optimization, we compute the cost gradient based on the complete training set; hence, we sometimes also call it batch gradient descent . In case of very large datasets, using Gradient Descent can be quite costly since we are only taking a single step for one pass over the training set -- thus, the larger the training set, the slower our algorithm updates the weights and the longer it may take until it converges to the global cost minimum (note that the SSE cost function is convex). In Stochastic Gradient Descent (sometimes also referred to as iterative or on-line gradient descent), we don't accumulate the weight updates as we've seen above for Gradient Descent: for one or more epochs: for each weight j w_j := w + \\Delta w_j , where: \\Delta w_j= \\eta \\sum_i (\\text{target}^{(i)} - \\text{output}^{(i)})x_{j}^{(i)} Instead, we update the weights after each training sample: for one or more epochs, or until approx. cost minimum is reached: for training sample i : for each weight j w_j := w + \\Delta w_j , where: \\Delta w_j= \\eta (\\text{target}^{(i)} - \\text{output}^{(i)})x_{j}^{(i)} Here, the term \"stochastic\" comes from the fact that the gradient based on a single training sample is a \"stochastic approximation\" of the \"true\" cost gradient. Due to its stochastic nature, the path towards the global cost minimum is not \"direct\" as in Gradient Descent, but may go \"zig-zag\" if we are visuallizing the cost surface in a 2D space. However, it has been shown that Stochastic Gradient Descent almost surely converges to the global cost minimum if the cost function is convex (or pseudo-convex)[1]. Stochastic Gradient Descent Shuffling There are several different flavors of stochastic gradient descent, which can be all seen throughout the literature. Let's take a look at the three most common variants: A) randomly shuffle samples in the training set for one or more epochs, or until approx. cost minimum is reached for training sample i compute gradients and perform weight updates B) for one or more epochs, or until approx. cost minimum is reached randomly shuffle samples in the training set for training sample i compute gradients and perform weight updates C) for iterations t , or until approx. cost minimum is reached: draw random sample from the training set compute gradients and perform weight updates In scenario A [3], we shuffle the training set only one time in the beginning; whereas in scenario B, we shuffle the training set after each epoch to prevent repeating update cycles. In both scenario A and scenario B, each training sample is only used once per epoch to update the model weights. In scenario C, we draw the training samples randomly with replacement from the training set [2]. If the number of iterations t is equal to the number of training samples, we learn the model based on a bootstrap sample of the training set. Mini-Batch Gradient Descent (MB-GD) Mini-Batch Gradient Descent (MB-GD) a compromise between batch GD and SGD. In MB-GD, we update the model based on smaller groups of training samples; instead of computing the gradient from 1 sample (SGD) or all n training samples (GD), we compute the gradient from 1 < k < n training samples (a common mini-batch size is k=50 ). MB-GD converges in fewer iterations than GD because we update the weights more frequently; however, MB-GD let's us utilize vectorized operation, which typically results in a computational performance gain over SGD. Learning Rates An adaptive learning rate \\eta : Choosing a decrease constant d that shrinks the learning rate over time: \\eta(t+1) := \\eta(t) / (1 + t \\times d) Momentum learning by adding a factor of the previous gradient to the weight update for faster updates: \\Delta \\mathbf{w}_{t+1} := \\eta \\nabla J(\\mathbf{w}_{t+1}) + \\alpha \\Delta {w}_{t} References [1] Bottou, L\u00e9on (1998). \"Online Algorithms and Stochastic Approximations\" . Online Learning and Neural Networks. Cambridge University Press. ISBN 978-0-521-65263-6 [2] Bottou, L\u00e9on. \"Large-scale machine learning with stochastic gradient descent.\" Proceedings of COMPSTAT'2010. Physica-Verlag HD, 2010. 177-186. [3] Bottou, L\u00e9on. \"Stochastic gradient descent tricks.\" Neural Networks: Tricks of the Trade. Springer Berlin Heidelberg, 2012. 421-436.","title":"Gradient Descent and Stochastic Gradient Descent"},{"location":"user_guide/general_concepts/gradient-optimization/#gradient-descent-and-stochastic-gradient-descent","text":"","title":"Gradient Descent and Stochastic Gradient Descent"},{"location":"user_guide/general_concepts/gradient-optimization/#gradient-descent-gd-optimization","text":"Using the Gradient Decent optimization algorithm, the weights are updated incrementally after each epoch (= pass over the training dataset). Compatible cost functions J(\\cdot) Sum of squared errors (SSE) [ mlxtend.regressor.LinearRegression , mlxtend.classfier.Adaline ]: J(\\mathbf{w}) = \\frac{1}{2} \\sum_i (\\text{target}^{(i)} - \\text{output}^{(i)})^2 Logistic Cost (cross-entropy) [ mlxtend.classfier.LogisticRegression ]: ... The magnitude and direction of the weight update is computed by taking a step in the opposite direction of the cost gradient \\Delta w_j = -\\eta \\frac{\\partial J}{\\partial w_j}, where \\eta is the learning rate. The weights are then updated after each epoch via the following update rule: \\mathbf{w} := \\mathbf{w} + \\Delta\\mathbf{w}, where \\Delta\\mathbf{w} is a vector that contains the weight updates of each weight coefficient {w} , which are computed as follows: \\Delta w_j = -\\eta \\frac{\\partial J}{\\partial w_j}\\\\ = -\\eta \\sum_i (\\text{target}^{(i)} - \\text{output}^{(i)})(-x_{j}^{(i)})\\\\ = \\eta \\sum_i (\\text{target}^{(i)} - \\text{output}^{(i)})x_{j}^{(i)}. Essentially, we can picture Gradient Descent optimization as a hiker (the weight coefficient) who wants to climb down a mountain (cost function) into valley (cost minimum), and each step is determined by the steepness of the slope (gradient) and the leg length of the hiker (learning rate). Considering a cost function with only a single weight coefficient, we can illustrate this concept as follows:","title":"Gradient Descent (GD) Optimization"},{"location":"user_guide/general_concepts/gradient-optimization/#stochastic-gradient-descent-sgd","text":"In Gradient Descent optimization, we compute the cost gradient based on the complete training set; hence, we sometimes also call it batch gradient descent . In case of very large datasets, using Gradient Descent can be quite costly since we are only taking a single step for one pass over the training set -- thus, the larger the training set, the slower our algorithm updates the weights and the longer it may take until it converges to the global cost minimum (note that the SSE cost function is convex). In Stochastic Gradient Descent (sometimes also referred to as iterative or on-line gradient descent), we don't accumulate the weight updates as we've seen above for Gradient Descent: for one or more epochs: for each weight j w_j := w + \\Delta w_j , where: \\Delta w_j= \\eta \\sum_i (\\text{target}^{(i)} - \\text{output}^{(i)})x_{j}^{(i)} Instead, we update the weights after each training sample: for one or more epochs, or until approx. cost minimum is reached: for training sample i : for each weight j w_j := w + \\Delta w_j , where: \\Delta w_j= \\eta (\\text{target}^{(i)} - \\text{output}^{(i)})x_{j}^{(i)} Here, the term \"stochastic\" comes from the fact that the gradient based on a single training sample is a \"stochastic approximation\" of the \"true\" cost gradient. Due to its stochastic nature, the path towards the global cost minimum is not \"direct\" as in Gradient Descent, but may go \"zig-zag\" if we are visuallizing the cost surface in a 2D space. However, it has been shown that Stochastic Gradient Descent almost surely converges to the global cost minimum if the cost function is convex (or pseudo-convex)[1].","title":"Stochastic Gradient Descent (SGD)"},{"location":"user_guide/general_concepts/gradient-optimization/#stochastic-gradient-descent-shuffling","text":"There are several different flavors of stochastic gradient descent, which can be all seen throughout the literature. Let's take a look at the three most common variants:","title":"Stochastic Gradient Descent Shuffling"},{"location":"user_guide/general_concepts/gradient-optimization/#a","text":"randomly shuffle samples in the training set for one or more epochs, or until approx. cost minimum is reached for training sample i compute gradients and perform weight updates","title":"A)"},{"location":"user_guide/general_concepts/gradient-optimization/#b","text":"for one or more epochs, or until approx. cost minimum is reached randomly shuffle samples in the training set for training sample i compute gradients and perform weight updates","title":"B)"},{"location":"user_guide/general_concepts/gradient-optimization/#c","text":"for iterations t , or until approx. cost minimum is reached: draw random sample from the training set compute gradients and perform weight updates In scenario A [3], we shuffle the training set only one time in the beginning; whereas in scenario B, we shuffle the training set after each epoch to prevent repeating update cycles. In both scenario A and scenario B, each training sample is only used once per epoch to update the model weights. In scenario C, we draw the training samples randomly with replacement from the training set [2]. If the number of iterations t is equal to the number of training samples, we learn the model based on a bootstrap sample of the training set.","title":"C)"},{"location":"user_guide/general_concepts/gradient-optimization/#mini-batch-gradient-descent-mb-gd","text":"Mini-Batch Gradient Descent (MB-GD) a compromise between batch GD and SGD. In MB-GD, we update the model based on smaller groups of training samples; instead of computing the gradient from 1 sample (SGD) or all n training samples (GD), we compute the gradient from 1 < k < n training samples (a common mini-batch size is k=50 ). MB-GD converges in fewer iterations than GD because we update the weights more frequently; however, MB-GD let's us utilize vectorized operation, which typically results in a computational performance gain over SGD.","title":"Mini-Batch Gradient Descent (MB-GD)"},{"location":"user_guide/general_concepts/gradient-optimization/#learning-rates","text":"An adaptive learning rate \\eta : Choosing a decrease constant d that shrinks the learning rate over time: \\eta(t+1) := \\eta(t) / (1 + t \\times d) Momentum learning by adding a factor of the previous gradient to the weight update for faster updates: \\Delta \\mathbf{w}_{t+1} := \\eta \\nabla J(\\mathbf{w}_{t+1}) + \\alpha \\Delta {w}_{t}","title":"Learning Rates"},{"location":"user_guide/general_concepts/gradient-optimization/#references","text":"[1] Bottou, L\u00e9on (1998). \"Online Algorithms and Stochastic Approximations\" . Online Learning and Neural Networks. Cambridge University Press. ISBN 978-0-521-65263-6 [2] Bottou, L\u00e9on. \"Large-scale machine learning with stochastic gradient descent.\" Proceedings of COMPSTAT'2010. Physica-Verlag HD, 2010. 177-186. [3] Bottou, L\u00e9on. \"Stochastic gradient descent tricks.\" Neural Networks: Tricks of the Trade. Springer Berlin Heidelberg, 2012. 421-436.","title":"References"},{"location":"user_guide/general_concepts/linear-gradient-derivative/","text":"Deriving the Gradient Descent Rule for Linear Regression and Adaline Linear Regression and Adaptive Linear Neurons (Adalines) are closely related to each other. In fact, the Adaline algorithm is a identical to linear regression except for a threshold function \\phi(\\cdot)_T that converts the continuous output into a categorical class label \\phi(z)_T = \\begin{cases} 1 & if \\; z \\geq 0 \\\\ 0 & if \\; z < 0 \\end{cases}, where z is the net input, which is computed as the sum of the input features \\mathbf{x} multiplied by the model weights \\mathbf{w} : z = w_0x_0 + w_1x_1 \\dots w_mx_m = \\sum_{j=0}^{m} x_j w_j = \\mathbf{w}^T \\mathbf{x} (Note that x_0 refers to the bias unit so that x_0=1 .) In the case of linear regression and Adaline, the activation function \\phi(\\cdot)_A is simply the identity function so that \\phi(z)_A = z . Now, in order to learn the optimal model weights \\mathbf{w} , we need to define a cost function that we can optimize. Here, our cost function J({\\cdot}) is the sum of squared errors (SSE), which we multiply by \\frac{1}{2} to make the derivation easier: J({\\mathbf{w}}) = \\frac{1}{2} \\sum_i \\big(y^{(i)} - \\phi(z)_{A}^{(i)}\\big)^2, where y^{(i)} is the label or target label of the i th training point x^{(i)} . (Note that the SSE cost function is convex and therefore differentiable.) In simple words, we can summarize the gradient descent learning as follows: Initialize the weights to 0 or small random numbers. For k epochs (passes over the training set) For each training sample x^{(i)} Compute the predicted output value \\hat{y}^{(i)} Compare \\hat{y}^{(i)} to the actual output y^{(i)} and Compute the \"weight update\" value Update the \"weight update\" value Update the weight coefficients by the accumulated \"weight update\" values Which we can translate into a more mathematical notation: Initialize the weights to 0 or small random numbers. For k epochs For each training sample x^{(i)} \\phi(z^{(i)})_A = \\hat{y}^{(i)} \\Delta w_{(t+1), \\; j} = \\eta (y^{(i)} - \\hat{y}^{(i)}) x_{j}^{(i)}\\; (where \\eta is the learning rate); \\Delta w_{j} := \\Delta w_j\\; + \\Delta w_{(t+1), \\;j} \\mathbf{w} := \\mathbf{w} + \\Delta \\mathbf{w} Performing this global weight update \\mathbf{w} := \\mathbf{w} + \\Delta \\mathbf{w}, can be understood as \"updating the model weights by taking an opposite step towards the cost gradient scaled by the learning rate \\eta \" \\Delta \\mathbf{w} = - \\eta \\nabla J(\\mathbf{w}), where the partial derivative with respect to each w_j can be written as \\frac{\\partial J}{\\partial w_j} = - \\sum_i \\big(y^{(i)} - \\phi(z)_{A}^{(i)}\\big) x_{j}^{(i)}. To summarize: in order to use gradient descent to learn the model coefficients, we simply update the weights \\mathbf{w} by taking a step into the opposite direction of the gradient for each pass over the training set -- that's basically it. But how do we get to the equation \\frac{\\partial J}{\\partial w_j} = - \\sum_i \\big(y^{(i)} - \\phi(z)_{A}^{(i)}\\big) x_{j}^{(i)}? Let's walk through the derivation step by step. \\begin{aligned} & \\frac{\\partial J}{\\partial w_j} \\\\ & = \\frac{\\partial}{\\partial w_j} \\frac{1}{2} \\sum_i \\big(y^{(i)} - \\phi(z)_{A}^{(i)}\\big)^2 \\\\ & = \\frac{1}{2} \\frac{\\partial}{\\partial w_j} \\sum_i \\big(y^{(i)} - \\phi(z)_{A}^{(i)}\\big)^2 \\\\ & = \\frac{1}{2} \\sum_i \\big(y^{(i)} - \\phi(z)_{A}^{(i)}\\big) \\frac{\\partial}{\\partial w_j} \\big(y^{(i)} - \\phi(z)_{A}^{(i)}\\big) \\\\ & = \\sum_i \\big(y^{(i)} - \\phi(z)_{A}^{(i)}\\big) \\frac{\\partial}{\\partial w_j} \\bigg(y^{(i)} - \\sum_i \\big(w_{j}^{(i)} x_{j}^{(i)} \\big) \\bigg) \\\\ & = \\sum_i \\big(y^{(i)} - \\phi(z)_{A}^{(i)}\\big)(-x_{j}^{(i)}) \\\\ & = - \\sum_i \\big(y^{(i)} - \\phi(z)_{A}^{(i)}\\big)x_{j}^{(i)} \\end{aligned}","title":"Deriving the Gradient Descent Rule for Linear Regression and Adaline"},{"location":"user_guide/general_concepts/linear-gradient-derivative/#deriving-the-gradient-descent-rule-for-linear-regression-and-adaline","text":"Linear Regression and Adaptive Linear Neurons (Adalines) are closely related to each other. In fact, the Adaline algorithm is a identical to linear regression except for a threshold function \\phi(\\cdot)_T that converts the continuous output into a categorical class label \\phi(z)_T = \\begin{cases} 1 & if \\; z \\geq 0 \\\\ 0 & if \\; z < 0 \\end{cases}, where z is the net input, which is computed as the sum of the input features \\mathbf{x} multiplied by the model weights \\mathbf{w} : z = w_0x_0 + w_1x_1 \\dots w_mx_m = \\sum_{j=0}^{m} x_j w_j = \\mathbf{w}^T \\mathbf{x} (Note that x_0 refers to the bias unit so that x_0=1 .) In the case of linear regression and Adaline, the activation function \\phi(\\cdot)_A is simply the identity function so that \\phi(z)_A = z . Now, in order to learn the optimal model weights \\mathbf{w} , we need to define a cost function that we can optimize. Here, our cost function J({\\cdot}) is the sum of squared errors (SSE), which we multiply by \\frac{1}{2} to make the derivation easier: J({\\mathbf{w}}) = \\frac{1}{2} \\sum_i \\big(y^{(i)} - \\phi(z)_{A}^{(i)}\\big)^2, where y^{(i)} is the label or target label of the i th training point x^{(i)} . (Note that the SSE cost function is convex and therefore differentiable.) In simple words, we can summarize the gradient descent learning as follows: Initialize the weights to 0 or small random numbers. For k epochs (passes over the training set) For each training sample x^{(i)} Compute the predicted output value \\hat{y}^{(i)} Compare \\hat{y}^{(i)} to the actual output y^{(i)} and Compute the \"weight update\" value Update the \"weight update\" value Update the weight coefficients by the accumulated \"weight update\" values Which we can translate into a more mathematical notation: Initialize the weights to 0 or small random numbers. For k epochs For each training sample x^{(i)} \\phi(z^{(i)})_A = \\hat{y}^{(i)} \\Delta w_{(t+1), \\; j} = \\eta (y^{(i)} - \\hat{y}^{(i)}) x_{j}^{(i)}\\; (where \\eta is the learning rate); \\Delta w_{j} := \\Delta w_j\\; + \\Delta w_{(t+1), \\;j} \\mathbf{w} := \\mathbf{w} + \\Delta \\mathbf{w} Performing this global weight update \\mathbf{w} := \\mathbf{w} + \\Delta \\mathbf{w}, can be understood as \"updating the model weights by taking an opposite step towards the cost gradient scaled by the learning rate \\eta \" \\Delta \\mathbf{w} = - \\eta \\nabla J(\\mathbf{w}), where the partial derivative with respect to each w_j can be written as \\frac{\\partial J}{\\partial w_j} = - \\sum_i \\big(y^{(i)} - \\phi(z)_{A}^{(i)}\\big) x_{j}^{(i)}. To summarize: in order to use gradient descent to learn the model coefficients, we simply update the weights \\mathbf{w} by taking a step into the opposite direction of the gradient for each pass over the training set -- that's basically it. But how do we get to the equation \\frac{\\partial J}{\\partial w_j} = - \\sum_i \\big(y^{(i)} - \\phi(z)_{A}^{(i)}\\big) x_{j}^{(i)}? Let's walk through the derivation step by step. \\begin{aligned} & \\frac{\\partial J}{\\partial w_j} \\\\ & = \\frac{\\partial}{\\partial w_j} \\frac{1}{2} \\sum_i \\big(y^{(i)} - \\phi(z)_{A}^{(i)}\\big)^2 \\\\ & = \\frac{1}{2} \\frac{\\partial}{\\partial w_j} \\sum_i \\big(y^{(i)} - \\phi(z)_{A}^{(i)}\\big)^2 \\\\ & = \\frac{1}{2} \\sum_i \\big(y^{(i)} - \\phi(z)_{A}^{(i)}\\big) \\frac{\\partial}{\\partial w_j} \\big(y^{(i)} - \\phi(z)_{A}^{(i)}\\big) \\\\ & = \\sum_i \\big(y^{(i)} - \\phi(z)_{A}^{(i)}\\big) \\frac{\\partial}{\\partial w_j} \\bigg(y^{(i)} - \\sum_i \\big(w_{j}^{(i)} x_{j}^{(i)} \\big) \\bigg) \\\\ & = \\sum_i \\big(y^{(i)} - \\phi(z)_{A}^{(i)}\\big)(-x_{j}^{(i)}) \\\\ & = - \\sum_i \\big(y^{(i)} - \\phi(z)_{A}^{(i)}\\big)x_{j}^{(i)} \\end{aligned}","title":"Deriving the Gradient Descent Rule for Linear Regression and Adaline"},{"location":"user_guide/general_concepts/regularization-linear/","text":"Regularization of Generalized Linear Models Overview We can understand regularization as an approach of adding an additional bias to a model to reduce the degree of overfitting in models that suffer from high variance. By adding regularization terms to the cost function, we penalize large model coefficients (weights); effectively, we are reducing the complexity of the model. L2 regularization In L2 regularization, we shrink the weights by computing the Euclidean norm of the weight coefficients (the weight vector \\mathbf{w} ); \\lambda is the regularization parameter to be optimized. L2: \\lambda\\; \\lVert \\mathbf{w} \\lVert_2 = \\lambda \\sum_{j=1}^{m} w_j^2 For example, we can regularize the sum of squared errors cost function (SSE) as follows: SSE = \\sum^{n}_{i=1} \\big(\\text{target}^{(i)} - \\text{output}^{(i)}\\big)^2 + L2 Intuitively, we can think of regression as an additional penalty term or constraint as shown in the figure below. Without regularization, our objective is to find the global cost minimum. By adding a regularization penalty, our objective becomes to minimize the cost function under the constraint that we have to stay within our \"budget\" (the gray-shaded ball). In addition, we can control the regularization strength via the regularization parameter \\lambda . The larger the value of \\lambda , the stronger the regularization of the model. The weight coefficients approach 0 when \\lambda goes towards infinity. L1 regularization In L1 regularization, we shrink the weights using the absolute values of the weight coefficients (the weight vector \\mathbf{w} ); \\lambda is the regularization parameter to be optimized. L1: \\lambda \\; \\lVert\\mathbf{w}\\rVert_1 = \\lambda \\sum_{j=1}^{m} |w_j| For example, we can regularize the sum of squared errors cost function (SSE) as follows: SSE = \\sum^{n}_{i=1} \\big(\\text{target}^{(i)} - \\text{output}^{(i)}\\big)^2 + L1 At its core, L1-regularization is very similar to L2 regularization. However, instead of a quadratic penalty term as in L2, we penalize the model by the absolute weight coefficients. As we can see in the figure below, our \"budget\" has \"sharp edges,\" which is the geometric interpretation of why the L1 model induces sparsity. References [1] M. Y. Park and T. Hastie. \"L1-regularization path algorithm for generalized linear models\" . Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69(4):659\u2013677, 2007. [2] A. Y. Ng. \"Feature selection, L1 vs. L2 regularization, and rotational invariance\" . In Proceedings of the twenty-first international conference on Machine learning, page 78. ACM, 2004.","title":"Regularization of Generalized Linear Models"},{"location":"user_guide/general_concepts/regularization-linear/#regularization-of-generalized-linear-models","text":"","title":"Regularization of Generalized Linear Models"},{"location":"user_guide/general_concepts/regularization-linear/#overview","text":"We can understand regularization as an approach of adding an additional bias to a model to reduce the degree of overfitting in models that suffer from high variance. By adding regularization terms to the cost function, we penalize large model coefficients (weights); effectively, we are reducing the complexity of the model.","title":"Overview"},{"location":"user_guide/general_concepts/regularization-linear/#l2-regularization","text":"In L2 regularization, we shrink the weights by computing the Euclidean norm of the weight coefficients (the weight vector \\mathbf{w} ); \\lambda is the regularization parameter to be optimized. L2: \\lambda\\; \\lVert \\mathbf{w} \\lVert_2 = \\lambda \\sum_{j=1}^{m} w_j^2 For example, we can regularize the sum of squared errors cost function (SSE) as follows: SSE = \\sum^{n}_{i=1} \\big(\\text{target}^{(i)} - \\text{output}^{(i)}\\big)^2 + L2 Intuitively, we can think of regression as an additional penalty term or constraint as shown in the figure below. Without regularization, our objective is to find the global cost minimum. By adding a regularization penalty, our objective becomes to minimize the cost function under the constraint that we have to stay within our \"budget\" (the gray-shaded ball). In addition, we can control the regularization strength via the regularization parameter \\lambda . The larger the value of \\lambda , the stronger the regularization of the model. The weight coefficients approach 0 when \\lambda goes towards infinity.","title":"L2 regularization"},{"location":"user_guide/general_concepts/regularization-linear/#l1-regularization","text":"In L1 regularization, we shrink the weights using the absolute values of the weight coefficients (the weight vector \\mathbf{w} ); \\lambda is the regularization parameter to be optimized. L1: \\lambda \\; \\lVert\\mathbf{w}\\rVert_1 = \\lambda \\sum_{j=1}^{m} |w_j| For example, we can regularize the sum of squared errors cost function (SSE) as follows: SSE = \\sum^{n}_{i=1} \\big(\\text{target}^{(i)} - \\text{output}^{(i)}\\big)^2 + L1 At its core, L1-regularization is very similar to L2 regularization. However, instead of a quadratic penalty term as in L2, we penalize the model by the absolute weight coefficients. As we can see in the figure below, our \"budget\" has \"sharp edges,\" which is the geometric interpretation of why the L1 model induces sparsity.","title":"L1 regularization"},{"location":"user_guide/general_concepts/regularization-linear/#references","text":"[1] M. Y. Park and T. Hastie. \"L1-regularization path algorithm for generalized linear models\" . Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69(4):659\u2013677, 2007. [2] A. Y. Ng. \"Feature selection, L1 vs. L2 regularization, and rotational invariance\" . In Proceedings of the twenty-first international conference on Machine learning, page 78. ACM, 2004.","title":"References"},{"location":"user_guide/image/extract_face_landmarks/","text":"Extract Face Landmarks A function extract facial landmarks. from mlxtend.image import extract_face_landmarks Overview The extract_face_landmarks function detects the faces in a given image, and then it will return the face landmark points (also known as face shape) for the first found face in the image based on dlib's face landmark detection code (http://dlib.net/face_landmark_detection_ex.cpp.html): The face detector we use is made using the classic Histogram of Oriented Gradients (HOG) feature combined with a linear classifier, an image pyramid, and sliding window detection scheme. The pose estimator was created by using dlib's implementation of the paper: One Millisecond Face Alignment with an Ensemble of Regression Trees by Vahid Kazemi and Josephine Sullivan, CVPR 2014 and was trained on the iBUG 300-W face landmark dataset (see https://ibug.doc.ic.ac.uk/resources/facial-point-annotations/): C. Sagonas, E. Antonakos, G, Tzimiropoulos, S. Zafeiriou, M. Pantic. 300 faces In-the-wild challenge: Database and results. Image and Vision Computing (IMAVIS), Special Issue on Facial Landmark Localisation \"In-The-Wild\". 2016. You can get the trained model file from: http://dlib.net/files/shape_predictor_68_face_landmarks.dat.bz2. Note that the license for the iBUG 300-W dataset excludes commercial use. So you should contact Imperial College London to find out if it's OK for you to use this model file in a commercial product. References Kazemi, Vahid, and Josephine Sullivan. \"One millisecond face alignment with an ensemble of regression trees.\" Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014. Example 1 import imageio import matplotlib.pyplot as plt from mlxtend.image import extract_face_landmarks img = imageio.imread('lena.png') landmarks = extract_face_landmarks(img) print(landmarks.shape) print('\\n\\nFirst 10 landmarks:\\n', landmarks[:10]) (68, 2) First 10 landmarks: [[206 266] [204 290] [205 314] [209 337] [220 357] [236 374] [253 387] [273 397] [290 398] [304 391]] Visualization of the landmarks: fig = plt.figure(figsize=(15, 5)) ax = fig.add_subplot(1, 3, 1) ax.imshow(img) ax = fig.add_subplot(1, 3, 2) ax.scatter(landmarks[:, 0], -landmarks[:, 1], alpha=0.8) ax = fig.add_subplot(1, 3, 3) img2 = img.copy() for p in landmarks: img2[p[1]-3:p[1]+3,p[0]-3:p[0]+3,:] = (255, 255, 255) ax.imshow(img2) plt.show() API extract_face_landmarks(img, return_dtype= ) Function to extract face landmarks. Note that this function requires an installation of the Python version of the library \"dlib\": http://dlib.net Parameters img : array, shape = [h, w, ?] numpy array of a face image. Supported shapes are - 3D tensors with 1 or more color channels, for example, RGB: [h, w, 3] - 2D tensors without color channel, for example, Grayscale: [h, w] return_dtype: the return data-type of the array, default: np.int32. Returns landmarks : numpy.ndarray, shape = [68, 2] A numpy array, where each row contains a landmark/point x-y coordinates. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/sources/image/extract_face_landmarks.ipynb","title":"Extract Face Landmarks"},{"location":"user_guide/image/extract_face_landmarks/#extract-face-landmarks","text":"A function extract facial landmarks. from mlxtend.image import extract_face_landmarks","title":"Extract Face Landmarks"},{"location":"user_guide/image/extract_face_landmarks/#overview","text":"The extract_face_landmarks function detects the faces in a given image, and then it will return the face landmark points (also known as face shape) for the first found face in the image based on dlib's face landmark detection code (http://dlib.net/face_landmark_detection_ex.cpp.html): The face detector we use is made using the classic Histogram of Oriented Gradients (HOG) feature combined with a linear classifier, an image pyramid, and sliding window detection scheme. The pose estimator was created by using dlib's implementation of the paper: One Millisecond Face Alignment with an Ensemble of Regression Trees by Vahid Kazemi and Josephine Sullivan, CVPR 2014 and was trained on the iBUG 300-W face landmark dataset (see https://ibug.doc.ic.ac.uk/resources/facial-point-annotations/): C. Sagonas, E. Antonakos, G, Tzimiropoulos, S. Zafeiriou, M. Pantic. 300 faces In-the-wild challenge: Database and results. Image and Vision Computing (IMAVIS), Special Issue on Facial Landmark Localisation \"In-The-Wild\". 2016. You can get the trained model file from: http://dlib.net/files/shape_predictor_68_face_landmarks.dat.bz2. Note that the license for the iBUG 300-W dataset excludes commercial use. So you should contact Imperial College London to find out if it's OK for you to use this model file in a commercial product.","title":"Overview"},{"location":"user_guide/image/extract_face_landmarks/#references","text":"Kazemi, Vahid, and Josephine Sullivan. \"One millisecond face alignment with an ensemble of regression trees.\" Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014.","title":"References"},{"location":"user_guide/image/extract_face_landmarks/#example-1","text":"import imageio import matplotlib.pyplot as plt from mlxtend.image import extract_face_landmarks img = imageio.imread('lena.png') landmarks = extract_face_landmarks(img) print(landmarks.shape) print('\\n\\nFirst 10 landmarks:\\n', landmarks[:10]) (68, 2) First 10 landmarks: [[206 266] [204 290] [205 314] [209 337] [220 357] [236 374] [253 387] [273 397] [290 398] [304 391]] Visualization of the landmarks: fig = plt.figure(figsize=(15, 5)) ax = fig.add_subplot(1, 3, 1) ax.imshow(img) ax = fig.add_subplot(1, 3, 2) ax.scatter(landmarks[:, 0], -landmarks[:, 1], alpha=0.8) ax = fig.add_subplot(1, 3, 3) img2 = img.copy() for p in landmarks: img2[p[1]-3:p[1]+3,p[0]-3:p[0]+3,:] = (255, 255, 255) ax.imshow(img2) plt.show()","title":"Example 1"},{"location":"user_guide/image/extract_face_landmarks/#api","text":"extract_face_landmarks(img, return_dtype= ) Function to extract face landmarks. Note that this function requires an installation of the Python version of the library \"dlib\": http://dlib.net Parameters img : array, shape = [h, w, ?] numpy array of a face image. Supported shapes are - 3D tensors with 1 or more color channels, for example, RGB: [h, w, 3] - 2D tensors without color channel, for example, Grayscale: [h, w] return_dtype: the return data-type of the array, default: np.int32. Returns landmarks : numpy.ndarray, shape = [68, 2] A numpy array, where each row contains a landmark/point x-y coordinates. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/sources/image/extract_face_landmarks.ipynb","title":"API"},{"location":"user_guide/math/num_combinations/","text":"Compute the Number of Combinations A function to calculate the number of combinations for creating subsequences of k elements out of a sequence with n elements. from mlxtend.math import num_combinations Overview Combinations are selections of items from a collection regardless of the order in which they appear (in contrast to permutations). For example, let's consider a combination of 3 elements (k=3) from a collection of 5 elements (n=5): collection: {1, 2, 3, 4, 5} combination 1a: {1, 3, 5} combination 1b: {1, 5, 3} combination 1c: {3, 5, 1} ... combination 2: {1, 3, 4} In the example above the combinations 1a, 1b, and 1c, are the \"same combination\" and counted as \"1 possible way to combine items 1, 3, and 5\" -- in combinations, the order does not matter. The number of ways to combine elements ( without replacement ) from a collection with size n into subsets of size k is computed via the binomial coefficient (\" n choose k \"): \\begin{pmatrix} n \\\\ k \\end{pmatrix} = \\frac{n(n-1)\\ldots(n-k+1)}{k(k-1)\\dots1} = \\frac{n!}{k!(n-k)!} To compute the number of combinations with replacement , the following, alternative equation is used (\" n multichoose k \"): \\begin{pmatrix} n \\\\ k \\end{pmatrix} = \\begin{pmatrix} n + k -1 \\\\ k \\end{pmatrix} References https://en.wikipedia.org/wiki/Combination Example 1 - Compute the number of combinations from mlxtend.math import num_combinations c = num_combinations(n=20, k=8, with_replacement=False) print('Number of ways to combine 20 elements' ' into 8 subelements: %d' % c) Number of ways to combine 20 elements into 8 subelements: 125970 from mlxtend.math import num_combinations c = num_combinations(n=20, k=8, with_replacement=True) print('Number of ways to combine 20 elements' ' into 8 subelements (with replacement): %d' % c) Number of ways to combine 20 elements into 8 subelements (with replacement): 2220075 Example 2 - A progress tracking use-case It is often quite useful to track the progress of a computational expensive tasks to estimate its runtime. Here, the num_combination function can be used to compute the maximum number of loops of a combinations iterable from itertools: import itertools import sys import time from mlxtend.math import num_combinations items = {1, 2, 3, 4, 5, 6, 7, 8} max_iter = num_combinations(n=len(items), k=3, with_replacement=False) for idx, i in enumerate(itertools.combinations(items, r=3)): # do some computation with itemset i time.sleep(0.1) sys.stdout.write('\\rProgress: %d/%d' % (idx + 1, max_iter)) sys.stdout.flush() Progress: 56/56 API num_combinations(n, k, with_replacement=False) Function to calculate the number of possible combinations. Parameters n : int Total number of items. k : int Number of elements of the target itemset. with_replacement : bool (default: False) Allows repeated elements if True. Returns comb : int Number of possible combinations. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/math/num_combinations/","title":"Compute the Number of Combinations"},{"location":"user_guide/math/num_combinations/#compute-the-number-of-combinations","text":"A function to calculate the number of combinations for creating subsequences of k elements out of a sequence with n elements. from mlxtend.math import num_combinations","title":"Compute the Number of Combinations"},{"location":"user_guide/math/num_combinations/#overview","text":"Combinations are selections of items from a collection regardless of the order in which they appear (in contrast to permutations). For example, let's consider a combination of 3 elements (k=3) from a collection of 5 elements (n=5): collection: {1, 2, 3, 4, 5} combination 1a: {1, 3, 5} combination 1b: {1, 5, 3} combination 1c: {3, 5, 1} ... combination 2: {1, 3, 4} In the example above the combinations 1a, 1b, and 1c, are the \"same combination\" and counted as \"1 possible way to combine items 1, 3, and 5\" -- in combinations, the order does not matter. The number of ways to combine elements ( without replacement ) from a collection with size n into subsets of size k is computed via the binomial coefficient (\" n choose k \"): \\begin{pmatrix} n \\\\ k \\end{pmatrix} = \\frac{n(n-1)\\ldots(n-k+1)}{k(k-1)\\dots1} = \\frac{n!}{k!(n-k)!} To compute the number of combinations with replacement , the following, alternative equation is used (\" n multichoose k \"): \\begin{pmatrix} n \\\\ k \\end{pmatrix} = \\begin{pmatrix} n + k -1 \\\\ k \\end{pmatrix}","title":"Overview"},{"location":"user_guide/math/num_combinations/#references","text":"https://en.wikipedia.org/wiki/Combination","title":"References"},{"location":"user_guide/math/num_combinations/#example-1-compute-the-number-of-combinations","text":"from mlxtend.math import num_combinations c = num_combinations(n=20, k=8, with_replacement=False) print('Number of ways to combine 20 elements' ' into 8 subelements: %d' % c) Number of ways to combine 20 elements into 8 subelements: 125970 from mlxtend.math import num_combinations c = num_combinations(n=20, k=8, with_replacement=True) print('Number of ways to combine 20 elements' ' into 8 subelements (with replacement): %d' % c) Number of ways to combine 20 elements into 8 subelements (with replacement): 2220075","title":"Example 1 - Compute the number of combinations"},{"location":"user_guide/math/num_combinations/#example-2-a-progress-tracking-use-case","text":"It is often quite useful to track the progress of a computational expensive tasks to estimate its runtime. Here, the num_combination function can be used to compute the maximum number of loops of a combinations iterable from itertools: import itertools import sys import time from mlxtend.math import num_combinations items = {1, 2, 3, 4, 5, 6, 7, 8} max_iter = num_combinations(n=len(items), k=3, with_replacement=False) for idx, i in enumerate(itertools.combinations(items, r=3)): # do some computation with itemset i time.sleep(0.1) sys.stdout.write('\\rProgress: %d/%d' % (idx + 1, max_iter)) sys.stdout.flush() Progress: 56/56","title":"Example 2 - A progress tracking use-case"},{"location":"user_guide/math/num_combinations/#api","text":"num_combinations(n, k, with_replacement=False) Function to calculate the number of possible combinations. Parameters n : int Total number of items. k : int Number of elements of the target itemset. with_replacement : bool (default: False) Allows repeated elements if True. Returns comb : int Number of possible combinations. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/math/num_combinations/","title":"API"},{"location":"user_guide/math/num_permutations/","text":"Compute the Number of Permutations A function to calculate the number of permutations for creating subsequences of k elements out of a sequence with n elements. from mlxtend.math import num_permutations Overview Permutations are selections of items from a collection with regard to the order in which they appear (in contrast to combinations). For example, let's consider a permutation of 3 elements (k=3) from a collection of 5 elements (n=5): collection: {1, 2, 3, 4, 5} combination 1a: {1, 3, 5} combination 1b: {1, 5, 3} combination 1c: {3, 5, 1} ... combination 2: {1, 3, 4} In the example above the permutations 1a, 1b, and 1c, are the \"same combination\" but distinct permutations -- in combinations, the order does not matter, but in permutation it does matter. The number of ways to combine elements ( without replacement ) from a collection with size n into subsets of size k is computed via the binomial coefficient (\" n choose k \"): k!\\begin{pmatrix} n \\\\ k \\end{pmatrix} = k! \\cdot \\frac{n!}{k!(n-k)!} = \\frac{n!}{(n-k)!} To compute the number of permutations with replacement , we simply need to compute n^k . References https://en.wikipedia.org/wiki/Permutation Example 1 - Compute the number of permutations from mlxtend.math import num_permutations c = num_permutations(n=20, k=8, with_replacement=False) print('Number of ways to permute 20 elements' ' into 8 subelements: %d' % c) Number of ways to permute 20 elements into 8 subelements: 5079110400 from mlxtend.math import num_permutations c = num_permutations(n=20, k=8, with_replacement=True) print('Number of ways to combine 20 elements' ' into 8 subelements (with replacement): %d' % c) Number of ways to combine 20 elements into 8 subelements (with replacement): 25600000000 Example 2 - A progress tracking use-case It is often quite useful to track the progress of a computational expensive tasks to estimate its runtime. Here, the num_combination function can be used to compute the maximum number of loops of a permutations iterable from itertools: import itertools import sys import time from mlxtend.math import num_permutations items = {1, 2, 3, 4, 5, 6, 7, 8} max_iter = num_permutations(n=len(items), k=3, with_replacement=False) for idx, i in enumerate(itertools.permutations(items, r=3)): # do some computation with itemset i time.sleep(0.01) sys.stdout.write('\\rProgress: %d/%d' % (idx + 1, max_iter)) sys.stdout.flush() Progress: 336/336 API num_permutations(n, k, with_replacement=False) Function to calculate the number of possible permutations. Parameters n : int Total number of items. k : int Number of elements of the target itemset. with_replacement : bool Allows repeated elements if True. Returns permut : int Number of possible permutations. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/math/num_permutations/","title":"Compute the Number of Permutations"},{"location":"user_guide/math/num_permutations/#compute-the-number-of-permutations","text":"A function to calculate the number of permutations for creating subsequences of k elements out of a sequence with n elements. from mlxtend.math import num_permutations","title":"Compute the Number of Permutations"},{"location":"user_guide/math/num_permutations/#overview","text":"Permutations are selections of items from a collection with regard to the order in which they appear (in contrast to combinations). For example, let's consider a permutation of 3 elements (k=3) from a collection of 5 elements (n=5): collection: {1, 2, 3, 4, 5} combination 1a: {1, 3, 5} combination 1b: {1, 5, 3} combination 1c: {3, 5, 1} ... combination 2: {1, 3, 4} In the example above the permutations 1a, 1b, and 1c, are the \"same combination\" but distinct permutations -- in combinations, the order does not matter, but in permutation it does matter. The number of ways to combine elements ( without replacement ) from a collection with size n into subsets of size k is computed via the binomial coefficient (\" n choose k \"): k!\\begin{pmatrix} n \\\\ k \\end{pmatrix} = k! \\cdot \\frac{n!}{k!(n-k)!} = \\frac{n!}{(n-k)!} To compute the number of permutations with replacement , we simply need to compute n^k .","title":"Overview"},{"location":"user_guide/math/num_permutations/#references","text":"https://en.wikipedia.org/wiki/Permutation","title":"References"},{"location":"user_guide/math/num_permutations/#example-1-compute-the-number-of-permutations","text":"from mlxtend.math import num_permutations c = num_permutations(n=20, k=8, with_replacement=False) print('Number of ways to permute 20 elements' ' into 8 subelements: %d' % c) Number of ways to permute 20 elements into 8 subelements: 5079110400 from mlxtend.math import num_permutations c = num_permutations(n=20, k=8, with_replacement=True) print('Number of ways to combine 20 elements' ' into 8 subelements (with replacement): %d' % c) Number of ways to combine 20 elements into 8 subelements (with replacement): 25600000000","title":"Example 1 - Compute the number of permutations"},{"location":"user_guide/math/num_permutations/#example-2-a-progress-tracking-use-case","text":"It is often quite useful to track the progress of a computational expensive tasks to estimate its runtime. Here, the num_combination function can be used to compute the maximum number of loops of a permutations iterable from itertools: import itertools import sys import time from mlxtend.math import num_permutations items = {1, 2, 3, 4, 5, 6, 7, 8} max_iter = num_permutations(n=len(items), k=3, with_replacement=False) for idx, i in enumerate(itertools.permutations(items, r=3)): # do some computation with itemset i time.sleep(0.01) sys.stdout.write('\\rProgress: %d/%d' % (idx + 1, max_iter)) sys.stdout.flush() Progress: 336/336","title":"Example 2 - A progress tracking use-case"},{"location":"user_guide/math/num_permutations/#api","text":"num_permutations(n, k, with_replacement=False) Function to calculate the number of possible permutations. Parameters n : int Total number of items. k : int Number of elements of the target itemset. with_replacement : bool Allows repeated elements if True. Returns permut : int Number of possible permutations. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/math/num_permutations/","title":"API"},{"location":"user_guide/math/vectorspace_dimensionality/","text":"Vectorspace Dimensionality A function to compute the number of dimensions a set of vectors (arranged as columns in a matrix) spans. from mlxtend.math import vectorspace_dimensionality Overview Given a set of vectors, arranged as columns in a matrix, the vectorspace_dimensionality computes the number of dimensions (i.e., hyper-volume) that the vectorspace spans using the Gram-Schmidt process [1]. In particular, since the Gram-Schmidt process yields vectors that are zero or normalized to 1 (i.e., an orthonormal vectorset if the input was a set of linearly independent vectors), the sum of the vector norms corresponds to the number of dimensions of a vectorset. References [1] https://en.wikipedia.org/wiki/Gram\u2013Schmidt_process Example 1 - Compute the dimensions of a vectorspace Let's assume we have the two basis vectors x=[1 \\;\\;\\; 0]^T and y=[0\\;\\;\\; 1]^T as columns in a matrix. Due to the linear independence of the two vectors, the space that they span is naturally a plane (2D space): import numpy as np from mlxtend.math import vectorspace_dimensionality a = np.array([[1, 0], [0, 1]]) vectorspace_dimensionality(a) 2 However, if one vector is a linear combination of the other, it's intuitive to see that the space the vectorset describes is merely a line, aka a 1D space: b = np.array([[1, 2], [0, 0]]) vectorspace_dimensionality(a) 2 If 3 vectors are all linearly independent of each other, the dimensionality of the vector space is a volume (i.e., a 3D space): d = np.array([[1, 9, 1], [3, 2, 2], [5, 4, 3]]) vectorspace_dimensionality(d) 3 Again, if a pair of vectors is linearly dependent (here: the 1st and the 2nd row), this reduces the dimensionality by 1: c = np.array([[1, 2, 1], [3, 6, 2], [5, 10, 3]]) vectorspace_dimensionality(c) 2 API vectorspace_dimensionality(ary) Computes the hyper-volume spanned by a vector set Parameters ary : array-like, shape=[num_vectors, num_vectors] A set of vectors (arranged as columns in a matrix) Returns dimensions : int An integer indicating the \"dimensionality\" hyper-volume spanned by the vector set","title":"Vectorspace Dimensionality"},{"location":"user_guide/math/vectorspace_dimensionality/#vectorspace-dimensionality","text":"A function to compute the number of dimensions a set of vectors (arranged as columns in a matrix) spans. from mlxtend.math import vectorspace_dimensionality","title":"Vectorspace Dimensionality"},{"location":"user_guide/math/vectorspace_dimensionality/#overview","text":"Given a set of vectors, arranged as columns in a matrix, the vectorspace_dimensionality computes the number of dimensions (i.e., hyper-volume) that the vectorspace spans using the Gram-Schmidt process [1]. In particular, since the Gram-Schmidt process yields vectors that are zero or normalized to 1 (i.e., an orthonormal vectorset if the input was a set of linearly independent vectors), the sum of the vector norms corresponds to the number of dimensions of a vectorset.","title":"Overview"},{"location":"user_guide/math/vectorspace_dimensionality/#references","text":"[1] https://en.wikipedia.org/wiki/Gram\u2013Schmidt_process","title":"References"},{"location":"user_guide/math/vectorspace_dimensionality/#example-1-compute-the-dimensions-of-a-vectorspace","text":"Let's assume we have the two basis vectors x=[1 \\;\\;\\; 0]^T and y=[0\\;\\;\\; 1]^T as columns in a matrix. Due to the linear independence of the two vectors, the space that they span is naturally a plane (2D space): import numpy as np from mlxtend.math import vectorspace_dimensionality a = np.array([[1, 0], [0, 1]]) vectorspace_dimensionality(a) 2 However, if one vector is a linear combination of the other, it's intuitive to see that the space the vectorset describes is merely a line, aka a 1D space: b = np.array([[1, 2], [0, 0]]) vectorspace_dimensionality(a) 2 If 3 vectors are all linearly independent of each other, the dimensionality of the vector space is a volume (i.e., a 3D space): d = np.array([[1, 9, 1], [3, 2, 2], [5, 4, 3]]) vectorspace_dimensionality(d) 3 Again, if a pair of vectors is linearly dependent (here: the 1st and the 2nd row), this reduces the dimensionality by 1: c = np.array([[1, 2, 1], [3, 6, 2], [5, 10, 3]]) vectorspace_dimensionality(c) 2","title":"Example 1 - Compute the dimensions of a vectorspace"},{"location":"user_guide/math/vectorspace_dimensionality/#api","text":"vectorspace_dimensionality(ary) Computes the hyper-volume spanned by a vector set Parameters ary : array-like, shape=[num_vectors, num_vectors] A set of vectors (arranged as columns in a matrix) Returns dimensions : int An integer indicating the \"dimensionality\" hyper-volume spanned by the vector set","title":"API"},{"location":"user_guide/math/vectorspace_orthonormalization/","text":"Vectorspace Orthonormalization A function that converts a set of linearly independent vectors to a set of orthonormal basis vectors. from mlxtend.math import vectorspace_orthonormalization Overview The vectorspace_orthonormalization converts a set linearly independent vectors to a set of orthonormal basis vectors using the Gram-Schmidt process [1]. References [1] https://en.wikipedia.org/wiki/Gram\u2013Schmidt_process Example 1 - Convert a set of vector to an orthonormal basis Note that to convert a set of linearly independent vectors into a set of orthonormal basis vectors, the vectorspace_orthonormalization function expects the vectors to be arranged as columns of a matrix (here: NumPy array). Please keep in mind that the vectorspace_orthonormalization function also works for non-linearly independent vector sets; however, the resulting vectorset won't be orthonormal as a result. An easy way to check whether all vectors in the input set are linearly independent is to use the numpy.linalg.det (determinant) function. import numpy as np from mlxtend.math import vectorspace_orthonormalization a = np.array([[2, 0, 4, 12], [0, 2, 16, 4], [4, 16, 6, 2], [2, -12, 4, 6]]) s = '' if np.linalg.det(a) == 0.0: s = ' not' print('Input vectors are%s linearly independent' % s) vectorspace_orthonormalization(a) Input vectors are linearly independent array([[ 0.40824829, -0.1814885 , 0.04982278, 0.89325973], [ 0. , 0.1088931 , 0.99349591, -0.03328918], [ 0.81649658, 0.50816781, -0.06462163, -0.26631346], [ 0.40824829, -0.83484711, 0.07942048, -0.36063281]]) Note that scaling the inputs equally by a factor should leave the results unchanged: vectorspace_orthonormalization(a/2) array([[ 0.40824829, -0.1814885 , 0.04982278, 0.89325973], [ 0. , 0.1088931 , 0.99349591, -0.03328918], [ 0.81649658, 0.50816781, -0.06462163, -0.26631346], [ 0.40824829, -0.83484711, 0.07942048, -0.36063281]]) However, in case of linear dependence (the second column is a linear combination of the first column in the example below), the vector elements of one of the dependent vectors will become zero. (For a pair of linear dependent vectors, the one with the larger column index will be the one that's zero-ed.) a[:, 1] = a[:, 0] * 2 vectorspace_orthonormalization(a) array([[ 0.40824829, 0. , 0.04155858, 0.82364839], [ 0. , 0. , 0.99740596, -0.06501108], [ 0.81649658, 0. , -0.04155858, -0.52008861], [ 0.40824829, 0. , 0.04155858, 0.21652883]]) API vectorspace_orthonormalization(ary, eps=1e-13) Transforms a set of column vectors to a orthonormal basis. Given a set of linearly independent vectors, this functions converts such column vectors, arranged in a matrix, into orthonormal basis vectors. Parameters ary : array-like, shape=[num_vectors, num_vectors] A set of vectors (arranged as columns in a matrix) eps : float (default: 1e-13) A small tolerance value to determine whether the vector norm is zero or not. Returns arr : array-like, shape=[num_vectors, num_vectors] An orthonormal set of vectors (arranged as columns)","title":"Vectorspace Orthonormalization"},{"location":"user_guide/math/vectorspace_orthonormalization/#vectorspace-orthonormalization","text":"A function that converts a set of linearly independent vectors to a set of orthonormal basis vectors. from mlxtend.math import vectorspace_orthonormalization","title":"Vectorspace Orthonormalization"},{"location":"user_guide/math/vectorspace_orthonormalization/#overview","text":"The vectorspace_orthonormalization converts a set linearly independent vectors to a set of orthonormal basis vectors using the Gram-Schmidt process [1].","title":"Overview"},{"location":"user_guide/math/vectorspace_orthonormalization/#references","text":"[1] https://en.wikipedia.org/wiki/Gram\u2013Schmidt_process","title":"References"},{"location":"user_guide/math/vectorspace_orthonormalization/#example-1-convert-a-set-of-vector-to-an-orthonormal-basis","text":"Note that to convert a set of linearly independent vectors into a set of orthonormal basis vectors, the vectorspace_orthonormalization function expects the vectors to be arranged as columns of a matrix (here: NumPy array). Please keep in mind that the vectorspace_orthonormalization function also works for non-linearly independent vector sets; however, the resulting vectorset won't be orthonormal as a result. An easy way to check whether all vectors in the input set are linearly independent is to use the numpy.linalg.det (determinant) function. import numpy as np from mlxtend.math import vectorspace_orthonormalization a = np.array([[2, 0, 4, 12], [0, 2, 16, 4], [4, 16, 6, 2], [2, -12, 4, 6]]) s = '' if np.linalg.det(a) == 0.0: s = ' not' print('Input vectors are%s linearly independent' % s) vectorspace_orthonormalization(a) Input vectors are linearly independent array([[ 0.40824829, -0.1814885 , 0.04982278, 0.89325973], [ 0. , 0.1088931 , 0.99349591, -0.03328918], [ 0.81649658, 0.50816781, -0.06462163, -0.26631346], [ 0.40824829, -0.83484711, 0.07942048, -0.36063281]]) Note that scaling the inputs equally by a factor should leave the results unchanged: vectorspace_orthonormalization(a/2) array([[ 0.40824829, -0.1814885 , 0.04982278, 0.89325973], [ 0. , 0.1088931 , 0.99349591, -0.03328918], [ 0.81649658, 0.50816781, -0.06462163, -0.26631346], [ 0.40824829, -0.83484711, 0.07942048, -0.36063281]]) However, in case of linear dependence (the second column is a linear combination of the first column in the example below), the vector elements of one of the dependent vectors will become zero. (For a pair of linear dependent vectors, the one with the larger column index will be the one that's zero-ed.) a[:, 1] = a[:, 0] * 2 vectorspace_orthonormalization(a) array([[ 0.40824829, 0. , 0.04155858, 0.82364839], [ 0. , 0. , 0.99740596, -0.06501108], [ 0.81649658, 0. , -0.04155858, -0.52008861], [ 0.40824829, 0. , 0.04155858, 0.21652883]])","title":"Example 1 - Convert a set of vector to an orthonormal basis"},{"location":"user_guide/math/vectorspace_orthonormalization/#api","text":"vectorspace_orthonormalization(ary, eps=1e-13) Transforms a set of column vectors to a orthonormal basis. Given a set of linearly independent vectors, this functions converts such column vectors, arranged in a matrix, into orthonormal basis vectors. Parameters ary : array-like, shape=[num_vectors, num_vectors] A set of vectors (arranged as columns in a matrix) eps : float (default: 1e-13) A small tolerance value to determine whether the vector norm is zero or not. Returns arr : array-like, shape=[num_vectors, num_vectors] An orthonormal set of vectors (arranged as columns)","title":"API"},{"location":"user_guide/plotting/category_scatter/","text":"Scatterplot with Categories A function to quickly produce a scatter plot colored by categories from a pandas DataFrame or NumPy ndarray object. from mlxtend.general_plotting import category_scatter Overview References - Example 1 - Category Scatter from Pandas DataFrames import pandas as pd from io import StringIO csvfile = \"\"\"label,x,y class1,10.0,8.04 class1,10.5,7.30 class2,8.3,5.5 class2,8.1,5.9 class3,3.5,3.5 class3,3.8,5.1\"\"\" df = pd.read_csv(StringIO(csvfile)) df label x y 0 class1 10.0 8.04 1 class1 10.5 7.30 2 class2 8.3 5.50 3 class2 8.1 5.90 4 class3 3.5 3.50 5 class3 3.8 5.10 Plotting the data where the categories are determined by the unique values in the label column label_col . The x and y values are simply the column names of the DataFrame that we want to plot. import matplotlib.pyplot as plt from mlxtend.plotting import category_scatter fig = category_scatter(x='x', y='y', label_col='label', data=df, legend_loc='upper left') Example 2 - Category Scatter from NumPy Arrays import numpy as np from io import BytesIO csvfile = \"\"\"1,10.0,8.04 1,10.5,7.30 2,8.3,5.5 2,8.1,5.9 3,3.5,3.5 3,3.8,5.1\"\"\" ary = np.genfromtxt(BytesIO(csvfile.encode()), delimiter=',') ary array([[ 1. , 10. , 8.04], [ 1. , 10.5 , 7.3 ], [ 2. , 8.3 , 5.5 ], [ 2. , 8.1 , 5.9 ], [ 3. , 3.5 , 3.5 ], [ 3. , 3.8 , 5.1 ]]) Now, pretending that the first column represents the labels, and the second and third column represent the x and y values, respectively. import matplotlib.pyplot as plt from mlxtend.plotting import category_scatter fix = category_scatter(x=1, y=2, label_col=0, data=ary, legend_loc='upper left') API category_scatter(x, y, label_col, data, markers='sxo^v', colors=('blue', 'green', 'red', 'purple', 'gray', 'cyan'), alpha=0.7, markersize=20.0, legend_loc='best') Scatter plot to plot categories in different colors/markerstyles. Parameters x : str or int DataFrame column name of the x-axis values or integer for the numpy ndarray column index. y : str DataFrame column name of the y-axis values or integer for the numpy ndarray column index data : Pandas DataFrame object or NumPy ndarray. markers : str Markers that are cycled through the label category. colors : tuple Colors that are cycled through the label category. alpha : float (default: 0.7) Parameter to control the transparency. markersize : float (default` : 20.0) Parameter to control the marker size. legend_loc : str (default: 'best') Location of the plot legend {best, upper left, upper right, lower left, lower right} No legend if legend_loc=False Returns fig : matplotlig.pyplot figure object Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/plotting/category_scatter/","title":"Scatterplot with Categories"},{"location":"user_guide/plotting/category_scatter/#scatterplot-with-categories","text":"A function to quickly produce a scatter plot colored by categories from a pandas DataFrame or NumPy ndarray object. from mlxtend.general_plotting import category_scatter","title":"Scatterplot with Categories"},{"location":"user_guide/plotting/category_scatter/#overview","text":"","title":"Overview"},{"location":"user_guide/plotting/category_scatter/#references","text":"-","title":"References"},{"location":"user_guide/plotting/category_scatter/#example-1-category-scatter-from-pandas-dataframes","text":"import pandas as pd from io import StringIO csvfile = \"\"\"label,x,y class1,10.0,8.04 class1,10.5,7.30 class2,8.3,5.5 class2,8.1,5.9 class3,3.5,3.5 class3,3.8,5.1\"\"\" df = pd.read_csv(StringIO(csvfile)) df label x y 0 class1 10.0 8.04 1 class1 10.5 7.30 2 class2 8.3 5.50 3 class2 8.1 5.90 4 class3 3.5 3.50 5 class3 3.8 5.10 Plotting the data where the categories are determined by the unique values in the label column label_col . The x and y values are simply the column names of the DataFrame that we want to plot. import matplotlib.pyplot as plt from mlxtend.plotting import category_scatter fig = category_scatter(x='x', y='y', label_col='label', data=df, legend_loc='upper left')","title":"Example 1 - Category Scatter from Pandas DataFrames"},{"location":"user_guide/plotting/category_scatter/#example-2-category-scatter-from-numpy-arrays","text":"import numpy as np from io import BytesIO csvfile = \"\"\"1,10.0,8.04 1,10.5,7.30 2,8.3,5.5 2,8.1,5.9 3,3.5,3.5 3,3.8,5.1\"\"\" ary = np.genfromtxt(BytesIO(csvfile.encode()), delimiter=',') ary array([[ 1. , 10. , 8.04], [ 1. , 10.5 , 7.3 ], [ 2. , 8.3 , 5.5 ], [ 2. , 8.1 , 5.9 ], [ 3. , 3.5 , 3.5 ], [ 3. , 3.8 , 5.1 ]]) Now, pretending that the first column represents the labels, and the second and third column represent the x and y values, respectively. import matplotlib.pyplot as plt from mlxtend.plotting import category_scatter fix = category_scatter(x=1, y=2, label_col=0, data=ary, legend_loc='upper left')","title":"Example 2 - Category Scatter from NumPy Arrays"},{"location":"user_guide/plotting/category_scatter/#api","text":"category_scatter(x, y, label_col, data, markers='sxo^v', colors=('blue', 'green', 'red', 'purple', 'gray', 'cyan'), alpha=0.7, markersize=20.0, legend_loc='best') Scatter plot to plot categories in different colors/markerstyles. Parameters x : str or int DataFrame column name of the x-axis values or integer for the numpy ndarray column index. y : str DataFrame column name of the y-axis values or integer for the numpy ndarray column index data : Pandas DataFrame object or NumPy ndarray. markers : str Markers that are cycled through the label category. colors : tuple Colors that are cycled through the label category. alpha : float (default: 0.7) Parameter to control the transparency. markersize : float (default` : 20.0) Parameter to control the marker size. legend_loc : str (default: 'best') Location of the plot legend {best, upper left, upper right, lower left, lower right} No legend if legend_loc=False Returns fig : matplotlig.pyplot figure object Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/plotting/category_scatter/","title":"API"},{"location":"user_guide/plotting/checkerboard_plot/","text":"Checkerboard Plot Function to plot a checkerboard plot / heat map via matplotlib from mlxtend.plotting import checkerboard plot Overview Function to plot a checkerboard plot / heat map via matplotlib. References - Example 1 - Default from mlxtend.plotting import checkerboard_plot import matplotlib.pyplot as plt import numpy as np ary = np.random.random((5, 4)) brd = checkerboard_plot(ary) plt.show() Example 2 - Changing colors and labels from mlxtend.plotting import checkerboard_plot import matplotlib.pyplot as plt import numpy as np checkerboard_plot(ary, col_labels=['abc', 'def', 'ghi', 'jkl'], row_labels=['sample %d' % i for i in range(1, 6)], cell_colors=['skyblue', 'whitesmoke'], font_colors=['black', 'black'], figsize=(4.5, 5)) plt.show() API checkerboard_plot(ary, cell_colors=('white', 'black'), font_colors=('black', 'white'), fmt='%.1f', figsize=None, row_labels=None, col_labels=None, fontsize=None) Plot a checkerboard table / heatmap via matplotlib. Parameters ary : array-like, shape = [n, m] A 2D Nnumpy array. cell_colors : tuple or list (default: ('white', 'black')) Tuple or list containing the two colors of the checkerboard pattern. font_colors : tuple or list (default: ('black', 'white')) Font colors corresponding to the cell colors. figsize : tuple (default: (2.5, 2.5)) Height and width of the figure fmt : str (default: '%.1f') Python string formatter for cell values. The default '%.1f' results in floats with 1 digit after the decimal point. Use '%d' to show numbers as integers. row_labels : list (default: None) List of the row labels. Uses the array row indices 0 to n by default. col_labels : list (default: None) List of the column labels. Uses the array column indices 0 to m by default. fontsize : int (default: None) Specifies the font size of the checkerboard table. Uses matplotlib's default if None. Returns fig : matplotlib Figure object. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/plotting/checkerboard_plot/","title":"Checkerboard Plot"},{"location":"user_guide/plotting/checkerboard_plot/#checkerboard-plot","text":"Function to plot a checkerboard plot / heat map via matplotlib from mlxtend.plotting import checkerboard plot","title":"Checkerboard Plot"},{"location":"user_guide/plotting/checkerboard_plot/#overview","text":"Function to plot a checkerboard plot / heat map via matplotlib.","title":"Overview"},{"location":"user_guide/plotting/checkerboard_plot/#references","text":"-","title":"References"},{"location":"user_guide/plotting/checkerboard_plot/#example-1-default","text":"from mlxtend.plotting import checkerboard_plot import matplotlib.pyplot as plt import numpy as np ary = np.random.random((5, 4)) brd = checkerboard_plot(ary) plt.show()","title":"Example 1 - Default"},{"location":"user_guide/plotting/checkerboard_plot/#example-2-changing-colors-and-labels","text":"from mlxtend.plotting import checkerboard_plot import matplotlib.pyplot as plt import numpy as np checkerboard_plot(ary, col_labels=['abc', 'def', 'ghi', 'jkl'], row_labels=['sample %d' % i for i in range(1, 6)], cell_colors=['skyblue', 'whitesmoke'], font_colors=['black', 'black'], figsize=(4.5, 5)) plt.show()","title":"Example 2 - Changing colors and labels"},{"location":"user_guide/plotting/checkerboard_plot/#api","text":"checkerboard_plot(ary, cell_colors=('white', 'black'), font_colors=('black', 'white'), fmt='%.1f', figsize=None, row_labels=None, col_labels=None, fontsize=None) Plot a checkerboard table / heatmap via matplotlib. Parameters ary : array-like, shape = [n, m] A 2D Nnumpy array. cell_colors : tuple or list (default: ('white', 'black')) Tuple or list containing the two colors of the checkerboard pattern. font_colors : tuple or list (default: ('black', 'white')) Font colors corresponding to the cell colors. figsize : tuple (default: (2.5, 2.5)) Height and width of the figure fmt : str (default: '%.1f') Python string formatter for cell values. The default '%.1f' results in floats with 1 digit after the decimal point. Use '%d' to show numbers as integers. row_labels : list (default: None) List of the row labels. Uses the array row indices 0 to n by default. col_labels : list (default: None) List of the column labels. Uses the array column indices 0 to m by default. fontsize : int (default: None) Specifies the font size of the checkerboard table. Uses matplotlib's default if None. Returns fig : matplotlib Figure object. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/plotting/checkerboard_plot/","title":"API"},{"location":"user_guide/plotting/ecdf/","text":"Empirical Cumulative Distribution Function Plot A function to conveniently plot an empirical cumulative distribution function. from mlxtend.ecdf import ecdf Overview A function to conveniently plot an empirical cumulative distribution function (ECDF) and adding percentile thresholds for exploratory data analysis. References - Example 1 - ECDF from mlxtend.data import iris_data from mlxtend.plotting import ecdf import matplotlib.pyplot as plt X, y = iris_data() ax, _, _ = ecdf(x=X[:, 0], x_label='sepal length (cm)') plt.show() Example 2 - Multiple ECDFs from mlxtend.data import iris_data from mlxtend.plotting import ecdf import matplotlib.pyplot as plt X, y = iris_data() # first ecdf x1 = X[:, 0] ax, _, _ = ecdf(x1, x_label='cm') # second ecdf x2 = X[:, 1] ax, _, _ = ecdf(x2, ax=ax) plt.legend(['sepal length', 'sepal width']) plt.show() Example 3 - ECDF with Percentile Thresholds from mlxtend.data import iris_data from mlxtend.plotting import ecdf import matplotlib.pyplot as plt X, y = iris_data() ax, threshold, count = ecdf(x=X[:, 0], x_label='sepal length (cm)', percentile=0.8) plt.show() print('Feature threshold at the 80th percentile:', threshold) print('Number of samples below the threshold:', count) Feature threshold at the 80th percentile: 6.5 Number of samples below the threshold: 120 API ecdf(x, y_label='ECDF', x_label=None, ax=None, percentile=None, ecdf_color=None, ecdf_marker='o', percentile_color='black', percentile_linestyle='--') Plots an Empirical Cumulative Distribution Function Parameters x : array or list, shape=[n_samples,] Array-like object containing the feature values y_label : str (default='ECDF') Text label for the y-axis x_label : str (default=None) Text label for the x-axis ax : matplotlib.axes.Axes (default: None) An existing matplotlib Axes. Creates one if ax=None percentile : float (default=None) Float between 0 and 1 for plotting a percentile threshold line ecdf_color : matplotlib color (default=None) Color for the ECDF plot; uses matplotlib defaults if None ecdf_marker : matplotlib marker (default='o') Marker style for the ECDF plot percentile_color : matplotlib color (default='black') Color for the percentile threshold if percentile is not None percentile_linestyle : matplotlib linestyle (default='--') Line style for the percentile threshold if percentile is not None Returns ax : matplotlib.axes.Axes object percentile_threshold : float Feature threshold at the percentile or None if percentile=None percentile_count : Number of if percentile is not None Number of samples that have a feature less or equal than the feature threshold at a percentile threshold or None if percentile=None Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/plotting/ecdf/","title":"Empirical Cumulative Distribution Function Plot"},{"location":"user_guide/plotting/ecdf/#empirical-cumulative-distribution-function-plot","text":"A function to conveniently plot an empirical cumulative distribution function. from mlxtend.ecdf import ecdf","title":"Empirical Cumulative Distribution Function Plot"},{"location":"user_guide/plotting/ecdf/#overview","text":"A function to conveniently plot an empirical cumulative distribution function (ECDF) and adding percentile thresholds for exploratory data analysis.","title":"Overview"},{"location":"user_guide/plotting/ecdf/#references","text":"-","title":"References"},{"location":"user_guide/plotting/ecdf/#example-1-ecdf","text":"from mlxtend.data import iris_data from mlxtend.plotting import ecdf import matplotlib.pyplot as plt X, y = iris_data() ax, _, _ = ecdf(x=X[:, 0], x_label='sepal length (cm)') plt.show()","title":"Example 1 - ECDF"},{"location":"user_guide/plotting/ecdf/#example-2-multiple-ecdfs","text":"from mlxtend.data import iris_data from mlxtend.plotting import ecdf import matplotlib.pyplot as plt X, y = iris_data() # first ecdf x1 = X[:, 0] ax, _, _ = ecdf(x1, x_label='cm') # second ecdf x2 = X[:, 1] ax, _, _ = ecdf(x2, ax=ax) plt.legend(['sepal length', 'sepal width']) plt.show()","title":"Example 2 - Multiple ECDFs"},{"location":"user_guide/plotting/ecdf/#example-3-ecdf-with-percentile-thresholds","text":"from mlxtend.data import iris_data from mlxtend.plotting import ecdf import matplotlib.pyplot as plt X, y = iris_data() ax, threshold, count = ecdf(x=X[:, 0], x_label='sepal length (cm)', percentile=0.8) plt.show() print('Feature threshold at the 80th percentile:', threshold) print('Number of samples below the threshold:', count) Feature threshold at the 80th percentile: 6.5 Number of samples below the threshold: 120","title":"Example 3 - ECDF with Percentile Thresholds"},{"location":"user_guide/plotting/ecdf/#api","text":"ecdf(x, y_label='ECDF', x_label=None, ax=None, percentile=None, ecdf_color=None, ecdf_marker='o', percentile_color='black', percentile_linestyle='--') Plots an Empirical Cumulative Distribution Function Parameters x : array or list, shape=[n_samples,] Array-like object containing the feature values y_label : str (default='ECDF') Text label for the y-axis x_label : str (default=None) Text label for the x-axis ax : matplotlib.axes.Axes (default: None) An existing matplotlib Axes. Creates one if ax=None percentile : float (default=None) Float between 0 and 1 for plotting a percentile threshold line ecdf_color : matplotlib color (default=None) Color for the ECDF plot; uses matplotlib defaults if None ecdf_marker : matplotlib marker (default='o') Marker style for the ECDF plot percentile_color : matplotlib color (default='black') Color for the percentile threshold if percentile is not None percentile_linestyle : matplotlib linestyle (default='--') Line style for the percentile threshold if percentile is not None Returns ax : matplotlib.axes.Axes object percentile_threshold : float Feature threshold at the percentile or None if percentile=None percentile_count : Number of if percentile is not None Number of samples that have a feature less or equal than the feature threshold at a percentile threshold or None if percentile=None Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/plotting/ecdf/","title":"API"},{"location":"user_guide/plotting/enrichment_plot/","text":"Enrichment Plot A function to plot step plots of cumulative counts. from mlxtend.general_plotting import category_scatter Overview In enrichment plots, the y-axis can be interpreted as \"how many samples are less or equal to the corresponding x-axis label.\" References - Example 1 - Enrichment Plots from Pandas DataFrames import pandas as pd s1 = [1.1, 1.5] s2 = [2.1, 1.8] s3 = [3.1, 2.1] s4 = [3.9, 2.5] data = [s1, s2, s3, s4] df = pd.DataFrame(data, columns=['X1', 'X2']) df X1 X2 0 1.1 1.5 1 2.1 1.8 2 3.1 2.1 3 3.9 2.5 Plotting the data where the categories are determined by the unique values in the label column label_col . The x and y values are simply the column names of the DataFrame that we want to plot. import matplotlib.pyplot as plt from mlxtend.plotting import enrichment_plot ax = enrichment_plot(df, legend_loc='upper left') API enrichment_plot(df, colors='bgrkcy', markers=' ', linestyles='-', alpha=0.5, lw=2, where='post', grid=True, count_label='Count', xlim='auto', ylim='auto', invert_axes=False, legend_loc='best', ax=None) Plot stacked barplots Parameters df : pandas.DataFrame A pandas DataFrame where columns represent the different categories. colors: str (default: 'bgrcky') The colors of the bars. markers : str (default: ' ') Matplotlib markerstyles, e.g, 'sov' for square,circle, and triangle markers. linestyles : str (default: '-') Matplotlib linestyles, e.g., '-,--' to cycle normal and dashed lines. Note that the different linestyles need to be separated by commas. alpha : float (default: 0.5) Transparency level from 0.0 to 1.0. lw : int or float (default: 2) Linewidth parameter. where : {'post', 'pre', 'mid'} (default: 'post') Starting location of the steps. grid : bool (default: True ) Plots a grid if True. count_label : str (default: 'Count') Label for the \"Count\"-axis. xlim : 'auto' or array-like [min, max] (default: 'auto') Min and maximum position of the x-axis range. ylim : 'auto' or array-like [min, max] (default: 'auto') Min and maximum position of the y-axis range. invert_axes : bool (default: False) Plots count on the x-axis if True. legend_loc : str (default: 'best') Location of the plot legend {best, upper left, upper right, lower left, lower right} No legend if legend_loc=False ax : matplotlib axis, optional (default: None) Use this axis for plotting or make a new one otherwise Returns ax : matplotlib axis Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/plotting/enrichment_plot/","title":"Enrichment Plot"},{"location":"user_guide/plotting/enrichment_plot/#enrichment-plot","text":"A function to plot step plots of cumulative counts. from mlxtend.general_plotting import category_scatter","title":"Enrichment Plot"},{"location":"user_guide/plotting/enrichment_plot/#overview","text":"In enrichment plots, the y-axis can be interpreted as \"how many samples are less or equal to the corresponding x-axis label.\"","title":"Overview"},{"location":"user_guide/plotting/enrichment_plot/#references","text":"-","title":"References"},{"location":"user_guide/plotting/enrichment_plot/#example-1-enrichment-plots-from-pandas-dataframes","text":"import pandas as pd s1 = [1.1, 1.5] s2 = [2.1, 1.8] s3 = [3.1, 2.1] s4 = [3.9, 2.5] data = [s1, s2, s3, s4] df = pd.DataFrame(data, columns=['X1', 'X2']) df X1 X2 0 1.1 1.5 1 2.1 1.8 2 3.1 2.1 3 3.9 2.5 Plotting the data where the categories are determined by the unique values in the label column label_col . The x and y values are simply the column names of the DataFrame that we want to plot. import matplotlib.pyplot as plt from mlxtend.plotting import enrichment_plot ax = enrichment_plot(df, legend_loc='upper left')","title":"Example 1 - Enrichment Plots from Pandas DataFrames"},{"location":"user_guide/plotting/enrichment_plot/#api","text":"enrichment_plot(df, colors='bgrkcy', markers=' ', linestyles='-', alpha=0.5, lw=2, where='post', grid=True, count_label='Count', xlim='auto', ylim='auto', invert_axes=False, legend_loc='best', ax=None) Plot stacked barplots Parameters df : pandas.DataFrame A pandas DataFrame where columns represent the different categories. colors: str (default: 'bgrcky') The colors of the bars. markers : str (default: ' ') Matplotlib markerstyles, e.g, 'sov' for square,circle, and triangle markers. linestyles : str (default: '-') Matplotlib linestyles, e.g., '-,--' to cycle normal and dashed lines. Note that the different linestyles need to be separated by commas. alpha : float (default: 0.5) Transparency level from 0.0 to 1.0. lw : int or float (default: 2) Linewidth parameter. where : {'post', 'pre', 'mid'} (default: 'post') Starting location of the steps. grid : bool (default: True ) Plots a grid if True. count_label : str (default: 'Count') Label for the \"Count\"-axis. xlim : 'auto' or array-like [min, max] (default: 'auto') Min and maximum position of the x-axis range. ylim : 'auto' or array-like [min, max] (default: 'auto') Min and maximum position of the y-axis range. invert_axes : bool (default: False) Plots count on the x-axis if True. legend_loc : str (default: 'best') Location of the plot legend {best, upper left, upper right, lower left, lower right} No legend if legend_loc=False ax : matplotlib axis, optional (default: None) Use this axis for plotting or make a new one otherwise Returns ax : matplotlib axis Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/plotting/enrichment_plot/","title":"API"},{"location":"user_guide/plotting/plot_confusion_matrix/","text":"Confusion Matrix Utility function for visualizing confusion matrices via matplotlib from mlxtend.plotting import plot_confusion_matrix Overview Confusion Matrix For more information on confusion matrices, please see mlxtend.evaluate.confusion_matrix . References - Example 1 - Binary from mlxtend.plotting import plot_confusion_matrix import matplotlib.pyplot as plt import numpy as np binary = np.array([[4, 1], [1, 2]]) fig, ax = plot_confusion_matrix(conf_mat=binary) plt.show() Example 2 - Binary absolute and relative with colorbar binary = np.array([[4, 1], [1, 2]]) fig, ax = plot_confusion_matrix(conf_mat=binary, show_absolute=True, show_normed=True, colorbar=True) plt.show() Example 3 - Multiclass relative multiclass = np.array([[2, 1, 0, 0], [1, 2, 0, 0], [0, 0, 1, 0], [0, 0, 0, 1]]) fig, ax = plot_confusion_matrix(conf_mat=multiclass, colorbar=True, show_absolute=False, show_normed=True) plt.show() API plot_confusion_matrix(conf_mat, hide_spines=False, hide_ticks=False, figsize=None, cmap=None, colorbar=False, show_absolute=True, show_normed=False) Plot a confusion matrix via matplotlib. Parameters conf_mat : array-like, shape = [n_classes, n_classes] Confusion matrix from evaluate.confusion matrix. hide_spines : bool (default: False) Hides axis spines if True. hide_ticks : bool (default: False) Hides axis ticks if True figsize : tuple (default: (2.5, 2.5)) Height and width of the figure cmap : matplotlib colormap (default: None ) Uses matplotlib.pyplot.cm.Blues if None colorbar : bool (default: False) Shows a colorbar if True show_absolute : bool (default: True) Shows absolute confusion matrix coefficients if True. At least one of show_absolute or show_normed must be True. show_normed : bool (default: False) Shows normed confusion matrix coefficients if True. The normed confusion matrix coefficients give the proportion of training examples per class that are assigned the correct label. At least one of show_absolute or show_normed must be True. Returns fig, ax : matplotlib.pyplot subplot objects Figure and axis elements of the subplot. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/plotting/plot_confusion_matrix/","title":"Confusion Matrix"},{"location":"user_guide/plotting/plot_confusion_matrix/#confusion-matrix","text":"Utility function for visualizing confusion matrices via matplotlib from mlxtend.plotting import plot_confusion_matrix","title":"Confusion Matrix"},{"location":"user_guide/plotting/plot_confusion_matrix/#overview","text":"","title":"Overview"},{"location":"user_guide/plotting/plot_confusion_matrix/#confusion-matrix_1","text":"For more information on confusion matrices, please see mlxtend.evaluate.confusion_matrix .","title":"Confusion Matrix"},{"location":"user_guide/plotting/plot_confusion_matrix/#references","text":"-","title":"References"},{"location":"user_guide/plotting/plot_confusion_matrix/#example-1-binary","text":"from mlxtend.plotting import plot_confusion_matrix import matplotlib.pyplot as plt import numpy as np binary = np.array([[4, 1], [1, 2]]) fig, ax = plot_confusion_matrix(conf_mat=binary) plt.show()","title":"Example 1 - Binary"},{"location":"user_guide/plotting/plot_confusion_matrix/#example-2-binary-absolute-and-relative-with-colorbar","text":"binary = np.array([[4, 1], [1, 2]]) fig, ax = plot_confusion_matrix(conf_mat=binary, show_absolute=True, show_normed=True, colorbar=True) plt.show()","title":"Example 2 - Binary absolute and relative with colorbar"},{"location":"user_guide/plotting/plot_confusion_matrix/#example-3-multiclass-relative","text":"multiclass = np.array([[2, 1, 0, 0], [1, 2, 0, 0], [0, 0, 1, 0], [0, 0, 0, 1]]) fig, ax = plot_confusion_matrix(conf_mat=multiclass, colorbar=True, show_absolute=False, show_normed=True) plt.show()","title":"Example 3 - Multiclass relative"},{"location":"user_guide/plotting/plot_confusion_matrix/#api","text":"plot_confusion_matrix(conf_mat, hide_spines=False, hide_ticks=False, figsize=None, cmap=None, colorbar=False, show_absolute=True, show_normed=False) Plot a confusion matrix via matplotlib. Parameters conf_mat : array-like, shape = [n_classes, n_classes] Confusion matrix from evaluate.confusion matrix. hide_spines : bool (default: False) Hides axis spines if True. hide_ticks : bool (default: False) Hides axis ticks if True figsize : tuple (default: (2.5, 2.5)) Height and width of the figure cmap : matplotlib colormap (default: None ) Uses matplotlib.pyplot.cm.Blues if None colorbar : bool (default: False) Shows a colorbar if True show_absolute : bool (default: True) Shows absolute confusion matrix coefficients if True. At least one of show_absolute or show_normed must be True. show_normed : bool (default: False) Shows normed confusion matrix coefficients if True. The normed confusion matrix coefficients give the proportion of training examples per class that are assigned the correct label. At least one of show_absolute or show_normed must be True. Returns fig, ax : matplotlib.pyplot subplot objects Figure and axis elements of the subplot. Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/plotting/plot_confusion_matrix/","title":"API"},{"location":"user_guide/plotting/plot_decision_regions/","text":"Plotting Decision Regions A function for plotting decision regions of classifiers in 1 or 2 dimensions. from mlxtend.plotting import plot_decision_regions References Example 1 - Decision regions in 2D from mlxtend.plotting import plot_decision_regions import matplotlib.pyplot as plt from sklearn import datasets from sklearn.svm import SVC # Loading some example data iris = datasets.load_iris() X = iris.data[:, [0, 2]] y = iris.target # Training a classifier svm = SVC(C=0.5, kernel='linear') svm.fit(X, y) # Plotting decision regions plot_decision_regions(X, y, clf=svm, legend=2) # Adding axes annotations plt.xlabel('sepal length [cm]') plt.ylabel('petal length [cm]') plt.title('SVM on Iris') plt.show() Example 2 - Decision regions in 1D from mlxtend.plotting import plot_decision_regions import matplotlib.pyplot as plt from sklearn import datasets from sklearn.svm import SVC # Loading some example data iris = datasets.load_iris() X = iris.data[:, 2] X = X[:, None] y = iris.target # Training a classifier svm = SVC(C=0.5, kernel='linear') svm.fit(X, y) # Plotting decision regions plot_decision_regions(X, y, clf=svm, legend=2) # Adding axes annotations plt.xlabel('sepal length [cm]') plt.title('SVM on Iris') plt.show() Example 3 - Decision Region Grids from sklearn.linear_model import LogisticRegression from sklearn.naive_bayes import GaussianNB from sklearn.ensemble import RandomForestClassifier from sklearn.svm import SVC from sklearn import datasets import numpy as np # Initializing Classifiers clf1 = LogisticRegression(random_state=1) clf2 = RandomForestClassifier(random_state=1) clf3 = GaussianNB() clf4 = SVC() # Loading some example data iris = datasets.load_iris() X = iris.data[:, [0,2]] y = iris.target import matplotlib.pyplot as plt from mlxtend.plotting import plot_decision_regions import matplotlib.gridspec as gridspec import itertools gs = gridspec.GridSpec(2, 2) fig = plt.figure(figsize=(10,8)) labels = ['Logistic Regression', 'Random Forest', 'Naive Bayes', 'SVM'] for clf, lab, grd in zip([clf1, clf2, clf3, clf4], labels, itertools.product([0, 1], repeat=2)): clf.fit(X, y) ax = plt.subplot(gs[grd[0], grd[1]]) fig = plot_decision_regions(X=X, y=y, clf=clf, legend=2) plt.title(lab) plt.show() /Users/sebastian/miniconda3/lib/python3.6/site-packages/sklearn/linear_model/logistic.py:432: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning. FutureWarning) /Users/sebastian/miniconda3/lib/python3.6/site-packages/sklearn/linear_model/logistic.py:459: FutureWarning: Default multi_class will be changed to 'auto' in 0.22. Specify the multi_class option to silence this warning. \"this warning.\", FutureWarning) /Users/sebastian/miniconda3/lib/python3.6/site-packages/sklearn/ensemble/forest.py:248: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22. \"10 in version 0.20 to 100 in 0.22.\", FutureWarning) /Users/sebastian/miniconda3/lib/python3.6/site-packages/sklearn/svm/base.py:196: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning. \"avoid this warning.\", FutureWarning) Example 4 - Highlighting Test Data Points from mlxtend.plotting import plot_decision_regions from mlxtend.preprocessing import shuffle_arrays_unison import matplotlib.pyplot as plt from sklearn import datasets from sklearn.svm import SVC # Loading some example data iris = datasets.load_iris() X, y = iris.data[:, [0,2]], iris.target X, y = shuffle_arrays_unison(arrays=[X, y], random_seed=3) X_train, y_train = X[:100], y[:100] X_test, y_test = X[100:], y[100:] # Training a classifier svm = SVC(C=0.5, kernel='linear') svm.fit(X_train, y_train) # Plotting decision regions plot_decision_regions(X, y, clf=svm, legend=2, X_highlight=X_test) # Adding axes annotations plt.xlabel('sepal length [cm]') plt.ylabel('petal length [cm]') plt.title('SVM on Iris') plt.show() Example 5 - Evaluating Classifier Behavior on Non-Linear Problems from sklearn.linear_model import LogisticRegression from sklearn.naive_bayes import GaussianNB from sklearn.ensemble import RandomForestClassifier from sklearn.svm import SVC # Initializing Classifiers clf1 = LogisticRegression(random_state=1) clf2 = RandomForestClassifier(n_estimators=100, random_state=1) clf3 = GaussianNB() clf4 = SVC() # Loading Plotting Utilities import matplotlib.pyplot as plt import matplotlib.gridspec as gridspec import itertools from mlxtend.plotting import plot_decision_regions import numpy as np XOR xx, yy = np.meshgrid(np.linspace(-3, 3, 50), np.linspace(-3, 3, 50)) rng = np.random.RandomState(0) X = rng.randn(300, 2) y = np.array(np.logical_xor(X[:, 0] > 0, X[:, 1] > 0), dtype=int) gs = gridspec.GridSpec(2, 2) fig = plt.figure(figsize=(10,8)) labels = ['Logistic Regression', 'Random Forest', 'Naive Bayes', 'SVM'] for clf, lab, grd in zip([clf1, clf2, clf3, clf4], labels, itertools.product([0, 1], repeat=2)): clf.fit(X, y) ax = plt.subplot(gs[grd[0], grd[1]]) fig = plot_decision_regions(X=X, y=y, clf=clf, legend=2) plt.title(lab) plt.show() /Users/sebastian/miniconda3/lib/python3.6/site-packages/sklearn/linear_model/logistic.py:432: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning. FutureWarning) /Users/sebastian/miniconda3/lib/python3.6/site-packages/sklearn/svm/base.py:196: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning. \"avoid this warning.\", FutureWarning) Half-Moons from sklearn.datasets import make_moons X, y = make_moons(n_samples=100, random_state=123) gs = gridspec.GridSpec(2, 2) fig = plt.figure(figsize=(10,8)) labels = ['Logistic Regression', 'Random Forest', 'Naive Bayes', 'SVM'] for clf, lab, grd in zip([clf1, clf2, clf3, clf4], labels, itertools.product([0, 1], repeat=2)): clf.fit(X, y) ax = plt.subplot(gs[grd[0], grd[1]]) fig = plot_decision_regions(X=X, y=y, clf=clf, legend=2) plt.title(lab) plt.show() /Users/sebastian/miniconda3/lib/python3.6/site-packages/sklearn/linear_model/logistic.py:432: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning. FutureWarning) /Users/sebastian/miniconda3/lib/python3.6/site-packages/sklearn/svm/base.py:196: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning. \"avoid this warning.\", FutureWarning) Concentric Circles from sklearn.datasets import make_circles X, y = make_circles(n_samples=1000, random_state=123, noise=0.1, factor=0.2) gs = gridspec.GridSpec(2, 2) fig = plt.figure(figsize=(10,8)) labels = ['Logistic Regression', 'Random Forest', 'Naive Bayes', 'SVM'] for clf, lab, grd in zip([clf1, clf2, clf3, clf4], labels, itertools.product([0, 1], repeat=2)): clf.fit(X, y) ax = plt.subplot(gs[grd[0], grd[1]]) fig = plot_decision_regions(X=X, y=y, clf=clf, legend=2) plt.title(lab) plt.show() /Users/sebastian/miniconda3/lib/python3.6/site-packages/sklearn/linear_model/logistic.py:432: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning. FutureWarning) /Users/sebastian/miniconda3/lib/python3.6/site-packages/sklearn/svm/base.py:196: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning. \"avoid this warning.\", FutureWarning) Example 6 - Working with existing axes objects using subplots import matplotlib.pyplot as plt from mlxtend.plotting import plot_decision_regions from sklearn.linear_model import LogisticRegression from sklearn.naive_bayes import GaussianNB from sklearn import datasets import numpy as np # Loading some example data iris = datasets.load_iris() X = iris.data[:, 2] X = X[:, None] y = iris.target # Initializing and fitting classifiers clf1 = LogisticRegression(random_state=1) clf2 = GaussianNB() clf1.fit(X, y) clf2.fit(X, y) fig, axes = plt.subplots(1, 2, figsize=(10, 3)) fig = plot_decision_regions(X=X, y=y, clf=clf1, ax=axes[0], legend=2) fig = plot_decision_regions(X=X, y=y, clf=clf2, ax=axes[1], legend=1) plt.show() /Users/sebastian/miniconda3/lib/python3.6/site-packages/sklearn/linear_model/logistic.py:432: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning. FutureWarning) /Users/sebastian/miniconda3/lib/python3.6/site-packages/sklearn/linear_model/logistic.py:459: FutureWarning: Default multi_class will be changed to 'auto' in 0.22. Specify the multi_class option to silence this warning. \"this warning.\", FutureWarning) Example 7 - Decision regions with more than two training features from mlxtend.plotting import plot_decision_regions import matplotlib.pyplot as plt from sklearn import datasets from sklearn.svm import SVC # Loading some example data X, y = datasets.make_blobs(n_samples=600, n_features=3, centers=[[2, 2, -2],[-2, -2, 2]], cluster_std=[2, 2], random_state=2) # Training a classifier svm = SVC() svm.fit(X, y) # Plotting decision regions fig, ax = plt.subplots() # Decision region for feature 3 = 1.5 value = 1.5 # Plot training sample with feature 3 = 1.5 +/- 0.75 width = 0.75 plot_decision_regions(X, y, clf=svm, filler_feature_values={2: value}, filler_feature_ranges={2: width}, legend=2, ax=ax) ax.set_xlabel('Feature 1') ax.set_ylabel('Feature 2') ax.set_title('Feature 3 = {}'.format(value)) # Adding axes annotations fig.suptitle('SVM on make_blobs') plt.show() /Users/sebastian/miniconda3/lib/python3.6/site-packages/sklearn/svm/base.py:196: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning. \"avoid this warning.\", FutureWarning) Example 8 - Grid of decision region slices from mlxtend.plotting import plot_decision_regions import matplotlib.pyplot as plt from sklearn import datasets from sklearn.svm import SVC # Loading some example data X, y = datasets.make_blobs(n_samples=500, n_features=3, centers=[[2, 2, -2],[-2, -2, 2]], cluster_std=[2, 2], random_state=2) # Training a classifier svm = SVC() svm.fit(X, y) # Plotting decision regions fig, axarr = plt.subplots(2, 2, figsize=(10,8), sharex=True, sharey=True) values = [-4.0, -1.0, 1.0, 4.0] width = 0.75 for value, ax in zip(values, axarr.flat): plot_decision_regions(X, y, clf=svm, filler_feature_values={2: value}, filler_feature_ranges={2: width}, legend=2, ax=ax) ax.set_xlabel('Feature 1') ax.set_ylabel('Feature 2') ax.set_title('Feature 3 = {}'.format(value)) # Adding axes annotations fig.suptitle('SVM on make_blobs') plt.show() /Users/sebastian/miniconda3/lib/python3.6/site-packages/sklearn/svm/base.py:196: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning. \"avoid this warning.\", FutureWarning) Example 9 - Customizing the plotting style from mlxtend.plotting import plot_decision_regions from mlxtend.preprocessing import shuffle_arrays_unison import matplotlib.pyplot as plt from sklearn import datasets from sklearn.svm import SVC # Loading some example data iris = datasets.load_iris() X = iris.data[:, [0, 2]] y = iris.target X, y = shuffle_arrays_unison(arrays=[X, y], random_seed=3) X_train, y_train = X[:100], y[:100] X_test, y_test = X[100:], y[100:] # Training a classifier svm = SVC(C=0.5, kernel='linear') svm.fit(X_train, y_train) # Specify keyword arguments to be passed to underlying plotting functions scatter_kwargs = {'s': 120, 'edgecolor': None, 'alpha': 0.7} contourf_kwargs = {'alpha': 0.2} scatter_highlight_kwargs = {'s': 120, 'label': 'Test data', 'alpha': 0.7} # Plotting decision regions plot_decision_regions(X, y, clf=svm, legend=2, X_highlight=X_test, scatter_kwargs=scatter_kwargs, contourf_kwargs=contourf_kwargs, scatter_highlight_kwargs=scatter_highlight_kwargs) # Adding axes annotations plt.xlabel('sepal length [cm]') plt.ylabel('petal length [cm]') plt.title('SVM on Iris') plt.show() Example 10 - Providing your own legend labels Custom legend labels can be provided by returning the axis object(s) from the plot_decision_region function and then getting the handles and labels of the legend. Custom handles (i.e., labels) can then be provided via ax.legend ax = plot_decision_regions(X, y, clf=svm, legend=0) handles, labels = ax.get_legend_handles_labels() ax.legend(handles, ['class 0', 'class 1', 'class 2'], framealpha=0.3, scatterpoints=1) An example is shown below. from mlxtend.plotting import plot_decision_regions import matplotlib.pyplot as plt from sklearn import datasets from sklearn.svm import SVC # Loading some example data iris = datasets.load_iris() X = iris.data[:, [0, 2]] y = iris.target # Training a classifier svm = SVC(C=0.5, kernel='linear') svm.fit(X, y) # Plotting decision regions ax = plot_decision_regions(X, y, clf=svm, legend=0) # Adding axes annotations plt.xlabel('sepal length [cm]') plt.ylabel('petal length [cm]') plt.title('SVM on Iris') handles, labels = ax.get_legend_handles_labels() ax.legend(handles, ['class square', 'class triangle', 'class circle'], framealpha=0.3, scatterpoints=1) plt.show() API plot_decision_regions(X, y, clf, feature_index=None, filler_feature_values=None, filler_feature_ranges=None, ax=None, X_highlight=None, res=None, legend=1, hide_spines=True, markers='s^oxv<>', colors='#1f77b4,#ff7f0e,#3ca02c,#d62728,#9467bd,#8c564b,#e377c2,#7f7f7f,#bcbd22,#17becf', scatter_kwargs=None, contourf_kwargs=None, scatter_highlight_kwargs=None) Plot decision regions of a classifier. Please note that this functions assumes that class labels are labeled consecutively, e.g,. 0, 1, 2, 3, 4, and 5. If you have class labels with integer labels > 4, you may want to provide additional colors and/or markers as colors and markers arguments. See http://matplotlib.org/examples/color/named_colors.html for more information. Parameters X : array-like, shape = [n_samples, n_features] Feature Matrix. y : array-like, shape = [n_samples] True class labels. clf : Classifier object. Must have a .predict method. feature_index : array-like (default: (0,) for 1D, (0, 1) otherwise) Feature indices to use for plotting. The first index in feature_index will be on the x-axis, the second index will be on the y-axis. filler_feature_values : dict (default: None) Only needed for number features > 2. Dictionary of feature index-value pairs for the features not being plotted. filler_feature_ranges : dict (default: None) Only needed for number features > 2. Dictionary of feature index-value pairs for the features not being plotted. Will use the ranges provided to select training samples for plotting. ax : matplotlib.axes.Axes (default: None) An existing matplotlib Axes. Creates one if ax=None. X_highlight : array-like, shape = [n_samples, n_features] (default: None) An array with data points that are used to highlight samples in X . res : float or array-like, shape = (2,) (default: None) This parameter was used to define the grid width, but it has been deprecated in favor of determining the number of points given the figure DPI and size automatically for optimal results and computational efficiency. To increase the resolution, it's is recommended to use to provide a dpi argument via matplotlib, e.g., plt.figure(dpi=600)`. hide_spines : bool (default: True) Hide axis spines if True. legend : int (default: 1) Integer to specify the legend location. No legend if legend is 0. markers : str (default: 's^oxv<>') Scatterplot markers. colors : str (default: 'red,blue,limegreen,gray,cyan') Comma separated list of colors. scatter_kwargs : dict (default: None) Keyword arguments for underlying matplotlib scatter function. contourf_kwargs : dict (default: None) Keyword arguments for underlying matplotlib contourf function. scatter_highlight_kwargs : dict (default: None) Keyword arguments for underlying matplotlib scatter function. Returns ax : matplotlib.axes.Axes object Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/plotting/plot_decision_regions/","title":"Plotting Decision Regions"},{"location":"user_guide/plotting/plot_decision_regions/#plotting-decision-regions","text":"A function for plotting decision regions of classifiers in 1 or 2 dimensions. from mlxtend.plotting import plot_decision_regions","title":"Plotting Decision Regions"},{"location":"user_guide/plotting/plot_decision_regions/#references","text":"","title":"References"},{"location":"user_guide/plotting/plot_decision_regions/#example-1-decision-regions-in-2d","text":"from mlxtend.plotting import plot_decision_regions import matplotlib.pyplot as plt from sklearn import datasets from sklearn.svm import SVC # Loading some example data iris = datasets.load_iris() X = iris.data[:, [0, 2]] y = iris.target # Training a classifier svm = SVC(C=0.5, kernel='linear') svm.fit(X, y) # Plotting decision regions plot_decision_regions(X, y, clf=svm, legend=2) # Adding axes annotations plt.xlabel('sepal length [cm]') plt.ylabel('petal length [cm]') plt.title('SVM on Iris') plt.show()","title":"Example 1 - Decision regions in 2D"},{"location":"user_guide/plotting/plot_decision_regions/#example-2-decision-regions-in-1d","text":"from mlxtend.plotting import plot_decision_regions import matplotlib.pyplot as plt from sklearn import datasets from sklearn.svm import SVC # Loading some example data iris = datasets.load_iris() X = iris.data[:, 2] X = X[:, None] y = iris.target # Training a classifier svm = SVC(C=0.5, kernel='linear') svm.fit(X, y) # Plotting decision regions plot_decision_regions(X, y, clf=svm, legend=2) # Adding axes annotations plt.xlabel('sepal length [cm]') plt.title('SVM on Iris') plt.show()","title":"Example 2 - Decision regions in 1D"},{"location":"user_guide/plotting/plot_decision_regions/#example-3-decision-region-grids","text":"from sklearn.linear_model import LogisticRegression from sklearn.naive_bayes import GaussianNB from sklearn.ensemble import RandomForestClassifier from sklearn.svm import SVC from sklearn import datasets import numpy as np # Initializing Classifiers clf1 = LogisticRegression(random_state=1) clf2 = RandomForestClassifier(random_state=1) clf3 = GaussianNB() clf4 = SVC() # Loading some example data iris = datasets.load_iris() X = iris.data[:, [0,2]] y = iris.target import matplotlib.pyplot as plt from mlxtend.plotting import plot_decision_regions import matplotlib.gridspec as gridspec import itertools gs = gridspec.GridSpec(2, 2) fig = plt.figure(figsize=(10,8)) labels = ['Logistic Regression', 'Random Forest', 'Naive Bayes', 'SVM'] for clf, lab, grd in zip([clf1, clf2, clf3, clf4], labels, itertools.product([0, 1], repeat=2)): clf.fit(X, y) ax = plt.subplot(gs[grd[0], grd[1]]) fig = plot_decision_regions(X=X, y=y, clf=clf, legend=2) plt.title(lab) plt.show() /Users/sebastian/miniconda3/lib/python3.6/site-packages/sklearn/linear_model/logistic.py:432: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning. FutureWarning) /Users/sebastian/miniconda3/lib/python3.6/site-packages/sklearn/linear_model/logistic.py:459: FutureWarning: Default multi_class will be changed to 'auto' in 0.22. Specify the multi_class option to silence this warning. \"this warning.\", FutureWarning) /Users/sebastian/miniconda3/lib/python3.6/site-packages/sklearn/ensemble/forest.py:248: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22. \"10 in version 0.20 to 100 in 0.22.\", FutureWarning) /Users/sebastian/miniconda3/lib/python3.6/site-packages/sklearn/svm/base.py:196: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning. \"avoid this warning.\", FutureWarning)","title":"Example 3 - Decision Region Grids"},{"location":"user_guide/plotting/plot_decision_regions/#example-4-highlighting-test-data-points","text":"from mlxtend.plotting import plot_decision_regions from mlxtend.preprocessing import shuffle_arrays_unison import matplotlib.pyplot as plt from sklearn import datasets from sklearn.svm import SVC # Loading some example data iris = datasets.load_iris() X, y = iris.data[:, [0,2]], iris.target X, y = shuffle_arrays_unison(arrays=[X, y], random_seed=3) X_train, y_train = X[:100], y[:100] X_test, y_test = X[100:], y[100:] # Training a classifier svm = SVC(C=0.5, kernel='linear') svm.fit(X_train, y_train) # Plotting decision regions plot_decision_regions(X, y, clf=svm, legend=2, X_highlight=X_test) # Adding axes annotations plt.xlabel('sepal length [cm]') plt.ylabel('petal length [cm]') plt.title('SVM on Iris') plt.show()","title":"Example 4 - Highlighting Test Data Points"},{"location":"user_guide/plotting/plot_decision_regions/#example-5-evaluating-classifier-behavior-on-non-linear-problems","text":"from sklearn.linear_model import LogisticRegression from sklearn.naive_bayes import GaussianNB from sklearn.ensemble import RandomForestClassifier from sklearn.svm import SVC # Initializing Classifiers clf1 = LogisticRegression(random_state=1) clf2 = RandomForestClassifier(n_estimators=100, random_state=1) clf3 = GaussianNB() clf4 = SVC() # Loading Plotting Utilities import matplotlib.pyplot as plt import matplotlib.gridspec as gridspec import itertools from mlxtend.plotting import plot_decision_regions import numpy as np","title":"Example 5 - Evaluating Classifier Behavior on Non-Linear Problems"},{"location":"user_guide/plotting/plot_decision_regions/#xor","text":"xx, yy = np.meshgrid(np.linspace(-3, 3, 50), np.linspace(-3, 3, 50)) rng = np.random.RandomState(0) X = rng.randn(300, 2) y = np.array(np.logical_xor(X[:, 0] > 0, X[:, 1] > 0), dtype=int) gs = gridspec.GridSpec(2, 2) fig = plt.figure(figsize=(10,8)) labels = ['Logistic Regression', 'Random Forest', 'Naive Bayes', 'SVM'] for clf, lab, grd in zip([clf1, clf2, clf3, clf4], labels, itertools.product([0, 1], repeat=2)): clf.fit(X, y) ax = plt.subplot(gs[grd[0], grd[1]]) fig = plot_decision_regions(X=X, y=y, clf=clf, legend=2) plt.title(lab) plt.show() /Users/sebastian/miniconda3/lib/python3.6/site-packages/sklearn/linear_model/logistic.py:432: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning. FutureWarning) /Users/sebastian/miniconda3/lib/python3.6/site-packages/sklearn/svm/base.py:196: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning. \"avoid this warning.\", FutureWarning)","title":"XOR"},{"location":"user_guide/plotting/plot_decision_regions/#half-moons","text":"from sklearn.datasets import make_moons X, y = make_moons(n_samples=100, random_state=123) gs = gridspec.GridSpec(2, 2) fig = plt.figure(figsize=(10,8)) labels = ['Logistic Regression', 'Random Forest', 'Naive Bayes', 'SVM'] for clf, lab, grd in zip([clf1, clf2, clf3, clf4], labels, itertools.product([0, 1], repeat=2)): clf.fit(X, y) ax = plt.subplot(gs[grd[0], grd[1]]) fig = plot_decision_regions(X=X, y=y, clf=clf, legend=2) plt.title(lab) plt.show() /Users/sebastian/miniconda3/lib/python3.6/site-packages/sklearn/linear_model/logistic.py:432: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning. FutureWarning) /Users/sebastian/miniconda3/lib/python3.6/site-packages/sklearn/svm/base.py:196: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning. \"avoid this warning.\", FutureWarning)","title":"Half-Moons"},{"location":"user_guide/plotting/plot_decision_regions/#concentric-circles","text":"from sklearn.datasets import make_circles X, y = make_circles(n_samples=1000, random_state=123, noise=0.1, factor=0.2) gs = gridspec.GridSpec(2, 2) fig = plt.figure(figsize=(10,8)) labels = ['Logistic Regression', 'Random Forest', 'Naive Bayes', 'SVM'] for clf, lab, grd in zip([clf1, clf2, clf3, clf4], labels, itertools.product([0, 1], repeat=2)): clf.fit(X, y) ax = plt.subplot(gs[grd[0], grd[1]]) fig = plot_decision_regions(X=X, y=y, clf=clf, legend=2) plt.title(lab) plt.show() /Users/sebastian/miniconda3/lib/python3.6/site-packages/sklearn/linear_model/logistic.py:432: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning. FutureWarning) /Users/sebastian/miniconda3/lib/python3.6/site-packages/sklearn/svm/base.py:196: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning. \"avoid this warning.\", FutureWarning)","title":"Concentric Circles"},{"location":"user_guide/plotting/plot_decision_regions/#example-6-working-with-existing-axes-objects-using-subplots","text":"import matplotlib.pyplot as plt from mlxtend.plotting import plot_decision_regions from sklearn.linear_model import LogisticRegression from sklearn.naive_bayes import GaussianNB from sklearn import datasets import numpy as np # Loading some example data iris = datasets.load_iris() X = iris.data[:, 2] X = X[:, None] y = iris.target # Initializing and fitting classifiers clf1 = LogisticRegression(random_state=1) clf2 = GaussianNB() clf1.fit(X, y) clf2.fit(X, y) fig, axes = plt.subplots(1, 2, figsize=(10, 3)) fig = plot_decision_regions(X=X, y=y, clf=clf1, ax=axes[0], legend=2) fig = plot_decision_regions(X=X, y=y, clf=clf2, ax=axes[1], legend=1) plt.show() /Users/sebastian/miniconda3/lib/python3.6/site-packages/sklearn/linear_model/logistic.py:432: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning. FutureWarning) /Users/sebastian/miniconda3/lib/python3.6/site-packages/sklearn/linear_model/logistic.py:459: FutureWarning: Default multi_class will be changed to 'auto' in 0.22. Specify the multi_class option to silence this warning. \"this warning.\", FutureWarning)","title":"Example 6 - Working with existing axes objects using subplots"},{"location":"user_guide/plotting/plot_decision_regions/#example-7-decision-regions-with-more-than-two-training-features","text":"from mlxtend.plotting import plot_decision_regions import matplotlib.pyplot as plt from sklearn import datasets from sklearn.svm import SVC # Loading some example data X, y = datasets.make_blobs(n_samples=600, n_features=3, centers=[[2, 2, -2],[-2, -2, 2]], cluster_std=[2, 2], random_state=2) # Training a classifier svm = SVC() svm.fit(X, y) # Plotting decision regions fig, ax = plt.subplots() # Decision region for feature 3 = 1.5 value = 1.5 # Plot training sample with feature 3 = 1.5 +/- 0.75 width = 0.75 plot_decision_regions(X, y, clf=svm, filler_feature_values={2: value}, filler_feature_ranges={2: width}, legend=2, ax=ax) ax.set_xlabel('Feature 1') ax.set_ylabel('Feature 2') ax.set_title('Feature 3 = {}'.format(value)) # Adding axes annotations fig.suptitle('SVM on make_blobs') plt.show() /Users/sebastian/miniconda3/lib/python3.6/site-packages/sklearn/svm/base.py:196: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning. \"avoid this warning.\", FutureWarning)","title":"Example 7 - Decision regions with more than two training features"},{"location":"user_guide/plotting/plot_decision_regions/#example-8-grid-of-decision-region-slices","text":"from mlxtend.plotting import plot_decision_regions import matplotlib.pyplot as plt from sklearn import datasets from sklearn.svm import SVC # Loading some example data X, y = datasets.make_blobs(n_samples=500, n_features=3, centers=[[2, 2, -2],[-2, -2, 2]], cluster_std=[2, 2], random_state=2) # Training a classifier svm = SVC() svm.fit(X, y) # Plotting decision regions fig, axarr = plt.subplots(2, 2, figsize=(10,8), sharex=True, sharey=True) values = [-4.0, -1.0, 1.0, 4.0] width = 0.75 for value, ax in zip(values, axarr.flat): plot_decision_regions(X, y, clf=svm, filler_feature_values={2: value}, filler_feature_ranges={2: width}, legend=2, ax=ax) ax.set_xlabel('Feature 1') ax.set_ylabel('Feature 2') ax.set_title('Feature 3 = {}'.format(value)) # Adding axes annotations fig.suptitle('SVM on make_blobs') plt.show() /Users/sebastian/miniconda3/lib/python3.6/site-packages/sklearn/svm/base.py:196: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning. \"avoid this warning.\", FutureWarning)","title":"Example 8 - Grid of decision region slices"},{"location":"user_guide/plotting/plot_decision_regions/#example-9-customizing-the-plotting-style","text":"from mlxtend.plotting import plot_decision_regions from mlxtend.preprocessing import shuffle_arrays_unison import matplotlib.pyplot as plt from sklearn import datasets from sklearn.svm import SVC # Loading some example data iris = datasets.load_iris() X = iris.data[:, [0, 2]] y = iris.target X, y = shuffle_arrays_unison(arrays=[X, y], random_seed=3) X_train, y_train = X[:100], y[:100] X_test, y_test = X[100:], y[100:] # Training a classifier svm = SVC(C=0.5, kernel='linear') svm.fit(X_train, y_train) # Specify keyword arguments to be passed to underlying plotting functions scatter_kwargs = {'s': 120, 'edgecolor': None, 'alpha': 0.7} contourf_kwargs = {'alpha': 0.2} scatter_highlight_kwargs = {'s': 120, 'label': 'Test data', 'alpha': 0.7} # Plotting decision regions plot_decision_regions(X, y, clf=svm, legend=2, X_highlight=X_test, scatter_kwargs=scatter_kwargs, contourf_kwargs=contourf_kwargs, scatter_highlight_kwargs=scatter_highlight_kwargs) # Adding axes annotations plt.xlabel('sepal length [cm]') plt.ylabel('petal length [cm]') plt.title('SVM on Iris') plt.show()","title":"Example 9 - Customizing the plotting style"},{"location":"user_guide/plotting/plot_decision_regions/#example-10-providing-your-own-legend-labels","text":"Custom legend labels can be provided by returning the axis object(s) from the plot_decision_region function and then getting the handles and labels of the legend. Custom handles (i.e., labels) can then be provided via ax.legend ax = plot_decision_regions(X, y, clf=svm, legend=0) handles, labels = ax.get_legend_handles_labels() ax.legend(handles, ['class 0', 'class 1', 'class 2'], framealpha=0.3, scatterpoints=1) An example is shown below. from mlxtend.plotting import plot_decision_regions import matplotlib.pyplot as plt from sklearn import datasets from sklearn.svm import SVC # Loading some example data iris = datasets.load_iris() X = iris.data[:, [0, 2]] y = iris.target # Training a classifier svm = SVC(C=0.5, kernel='linear') svm.fit(X, y) # Plotting decision regions ax = plot_decision_regions(X, y, clf=svm, legend=0) # Adding axes annotations plt.xlabel('sepal length [cm]') plt.ylabel('petal length [cm]') plt.title('SVM on Iris') handles, labels = ax.get_legend_handles_labels() ax.legend(handles, ['class square', 'class triangle', 'class circle'], framealpha=0.3, scatterpoints=1) plt.show()","title":"Example 10 - Providing your own legend labels"},{"location":"user_guide/plotting/plot_decision_regions/#api","text":"plot_decision_regions(X, y, clf, feature_index=None, filler_feature_values=None, filler_feature_ranges=None, ax=None, X_highlight=None, res=None, legend=1, hide_spines=True, markers='s^oxv<>', colors='#1f77b4,#ff7f0e,#3ca02c,#d62728,#9467bd,#8c564b,#e377c2,#7f7f7f,#bcbd22,#17becf', scatter_kwargs=None, contourf_kwargs=None, scatter_highlight_kwargs=None) Plot decision regions of a classifier. Please note that this functions assumes that class labels are labeled consecutively, e.g,. 0, 1, 2, 3, 4, and 5. If you have class labels with integer labels > 4, you may want to provide additional colors and/or markers as colors and markers arguments. See http://matplotlib.org/examples/color/named_colors.html for more information. Parameters X : array-like, shape = [n_samples, n_features] Feature Matrix. y : array-like, shape = [n_samples] True class labels. clf : Classifier object. Must have a .predict method. feature_index : array-like (default: (0,) for 1D, (0, 1) otherwise) Feature indices to use for plotting. The first index in feature_index will be on the x-axis, the second index will be on the y-axis. filler_feature_values : dict (default: None) Only needed for number features > 2. Dictionary of feature index-value pairs for the features not being plotted. filler_feature_ranges : dict (default: None) Only needed for number features > 2. Dictionary of feature index-value pairs for the features not being plotted. Will use the ranges provided to select training samples for plotting. ax : matplotlib.axes.Axes (default: None) An existing matplotlib Axes. Creates one if ax=None. X_highlight : array-like, shape = [n_samples, n_features] (default: None) An array with data points that are used to highlight samples in X . res : float or array-like, shape = (2,) (default: None) This parameter was used to define the grid width, but it has been deprecated in favor of determining the number of points given the figure DPI and size automatically for optimal results and computational efficiency. To increase the resolution, it's is recommended to use to provide a dpi argument via matplotlib, e.g., plt.figure(dpi=600)`. hide_spines : bool (default: True) Hide axis spines if True. legend : int (default: 1) Integer to specify the legend location. No legend if legend is 0. markers : str (default: 's^oxv<>') Scatterplot markers. colors : str (default: 'red,blue,limegreen,gray,cyan') Comma separated list of colors. scatter_kwargs : dict (default: None) Keyword arguments for underlying matplotlib scatter function. contourf_kwargs : dict (default: None) Keyword arguments for underlying matplotlib contourf function. scatter_highlight_kwargs : dict (default: None) Keyword arguments for underlying matplotlib scatter function. Returns ax : matplotlib.axes.Axes object Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/plotting/plot_decision_regions/","title":"API"},{"location":"user_guide/plotting/plot_learning_curves/","text":"Plotting Learning Curves A function to plot learning curves for classifiers. Learning curves are extremely useful to analyze if a model is suffering from over- or under-fitting (high variance or high bias). The function can be imported via from mlxtend.plotting import plot_learning_curves References - Example 1 from mlxtend.plotting import plot_learning_curves import matplotlib.pyplot as plt from mlxtend.data import iris_data from mlxtend.preprocessing import shuffle_arrays_unison from sklearn.neighbors import KNeighborsClassifier import numpy as np # Loading some example data X, y = iris_data() X, y = shuffle_arrays_unison(arrays=[X, y], random_seed=123) X_train, X_test = X[:100], X[100:] y_train, y_test = y[:100], y[100:] clf = KNeighborsClassifier(n_neighbors=5) plot_learning_curves(X_train, y_train, X_test, y_test, clf) plt.show() API plot_learning_curves(X_train, y_train, X_test, y_test, clf, train_marker='o', test_marker='^', scoring='misclassification error', suppress_plot=False, print_model=True, style='fivethirtyeight', legend_loc='best') Plots learning curves of a classifier. Parameters X_train : array-like, shape = [n_samples, n_features] Feature matrix of the training dataset. y_train : array-like, shape = [n_samples] True class labels of the training dataset. X_test : array-like, shape = [n_samples, n_features] Feature matrix of the test dataset. y_test : array-like, shape = [n_samples] True class labels of the test dataset. clf : Classifier object. Must have a .predict .fit method. train_marker : str (default: 'o') Marker for the training set line plot. test_marker : str (default: '^') Marker for the test set line plot. scoring : str (default: 'misclassification error') If not 'misclassification error', accepts the following metrics (from scikit-learn): {'accuracy', 'average_precision', 'f1_micro', 'f1_macro', 'f1_weighted', 'f1_samples', 'log_loss', 'precision', 'recall', 'roc_auc', 'adjusted_rand_score', 'mean_absolute_error', 'mean_squared_error', 'median_absolute_error', 'r2'} suppress_plot=False : bool (default: False) Suppress matplotlib plots if True. Recommended for testing purposes. print_model : bool (default: True) Print model parameters in plot title if True. style : str (default: 'fivethirtyeight') Matplotlib style legend_loc : str (default: 'best') Where to place the plot legend: {'best', 'upper left', 'upper right', 'lower left', 'lower right'} Returns errors : (training_error, test_error): tuple of lists Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/plotting/learning_curves/","title":"Plotting Learning Curves"},{"location":"user_guide/plotting/plot_learning_curves/#plotting-learning-curves","text":"A function to plot learning curves for classifiers. Learning curves are extremely useful to analyze if a model is suffering from over- or under-fitting (high variance or high bias). The function can be imported via from mlxtend.plotting import plot_learning_curves","title":"Plotting Learning Curves"},{"location":"user_guide/plotting/plot_learning_curves/#references","text":"-","title":"References"},{"location":"user_guide/plotting/plot_learning_curves/#example-1","text":"from mlxtend.plotting import plot_learning_curves import matplotlib.pyplot as plt from mlxtend.data import iris_data from mlxtend.preprocessing import shuffle_arrays_unison from sklearn.neighbors import KNeighborsClassifier import numpy as np # Loading some example data X, y = iris_data() X, y = shuffle_arrays_unison(arrays=[X, y], random_seed=123) X_train, X_test = X[:100], X[100:] y_train, y_test = y[:100], y[100:] clf = KNeighborsClassifier(n_neighbors=5) plot_learning_curves(X_train, y_train, X_test, y_test, clf) plt.show()","title":"Example 1"},{"location":"user_guide/plotting/plot_learning_curves/#api","text":"plot_learning_curves(X_train, y_train, X_test, y_test, clf, train_marker='o', test_marker='^', scoring='misclassification error', suppress_plot=False, print_model=True, style='fivethirtyeight', legend_loc='best') Plots learning curves of a classifier. Parameters X_train : array-like, shape = [n_samples, n_features] Feature matrix of the training dataset. y_train : array-like, shape = [n_samples] True class labels of the training dataset. X_test : array-like, shape = [n_samples, n_features] Feature matrix of the test dataset. y_test : array-like, shape = [n_samples] True class labels of the test dataset. clf : Classifier object. Must have a .predict .fit method. train_marker : str (default: 'o') Marker for the training set line plot. test_marker : str (default: '^') Marker for the test set line plot. scoring : str (default: 'misclassification error') If not 'misclassification error', accepts the following metrics (from scikit-learn): {'accuracy', 'average_precision', 'f1_micro', 'f1_macro', 'f1_weighted', 'f1_samples', 'log_loss', 'precision', 'recall', 'roc_auc', 'adjusted_rand_score', 'mean_absolute_error', 'mean_squared_error', 'median_absolute_error', 'r2'} suppress_plot=False : bool (default: False) Suppress matplotlib plots if True. Recommended for testing purposes. print_model : bool (default: True) Print model parameters in plot title if True. style : str (default: 'fivethirtyeight') Matplotlib style legend_loc : str (default: 'best') Where to place the plot legend: {'best', 'upper left', 'upper right', 'lower left', 'lower right'} Returns errors : (training_error, test_error): tuple of lists Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/plotting/learning_curves/","title":"API"},{"location":"user_guide/plotting/plot_linear_regression/","text":"Linear Regression Plot A function to plot linear regression fits. from mlxtend.plotting import plot_linear_regression Overview The plot_linear_regression is a convenience function that uses scikit-learn's linear_model.LinearRegression to fit a linear model and SciPy's stats.pearsonr to calculate the correlation coefficient. References - Example 1 - Ordinary Least Squares Simple Linear Regression import matplotlib.pyplot as plt from mlxtend.plotting import plot_linear_regression import numpy as np X = np.array([4, 8, 13, 26, 31, 10, 8, 30, 18, 12, 20, 5, 28, 18, 6, 31, 12, 12, 27, 11, 6, 14, 25, 7, 13,4, 15, 21, 15]) y = np.array([14, 24, 22, 59, 66, 25, 18, 60, 39, 32, 53, 18, 55, 41, 28, 61, 35, 36, 52, 23, 19, 25, 73, 16, 32, 14, 31, 43, 34]) intercept, slope, corr_coeff = plot_linear_regression(X, y) plt.show() API plot_linear_regression(X, y, model=LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False), corr_func='pearsonr', scattercolor='blue', fit_style='k--', legend=True, xlim='auto') Plot a linear regression line fit. Parameters X : numpy array, shape = [n_samples,] Samples. y : numpy array, shape (n_samples,) Target values model: object (default: sklearn.linear_model.LinearRegression) Estimator object for regression. Must implement a .fit() and .predict() method. corr_func: str or function (default: 'pearsonr') Uses pearsonr from scipy.stats if corr_func='pearsonr'. to compute the regression slope. If not 'pearsonr', the corr_func , the corr_func parameter expects a function of the form func( , ) as inputs, which is expected to return a tuple (, ) . scattercolor: string (default: blue) Color of scatter plot points. fit_style: string (default: k--) Style for the line fit. legend: bool (default: True) Plots legend with corr_coeff coef., fit coef., and intercept values. xlim: array-like (x_min, x_max) or 'auto' (default: 'auto') X-axis limits for the linear line fit. Returns regression_fit : tuple intercept, slope, corr_coeff (float, float, float) Examples For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/plotting/plot_linear_regression/","title":"Linear Regression Plot"},{"location":"user_guide/plotting/plot_linear_regression/#linear-regression-plot","text":"A function to plot linear regression fits. from mlxtend.plotting import plot_linear_regression","title":"Linear Regression Plot"},{"location":"user_guide/plotting/plot_linear_regression/#overview","text":"The plot_linear_regression is a convenience function that uses scikit-learn's linear_model.LinearRegression to fit a linear model and SciPy's stats.pearsonr to calculate the correlation coefficient.","title":"Overview"},{"location":"user_guide/plotting/plot_linear_regression/#references","text":"-","title":"References"},{"location":"user_guide/plotting/plot_linear_regression/#example-1-ordinary-least-squares-simple-linear-regression","text":"import matplotlib.pyplot as plt from mlxtend.plotting import plot_linear_regression import numpy as np X = np.array([4, 8, 13, 26, 31, 10, 8, 30, 18, 12, 20, 5, 28, 18, 6, 31, 12, 12, 27, 11, 6, 14, 25, 7, 13,4, 15, 21, 15]) y = np.array([14, 24, 22, 59, 66, 25, 18, 60, 39, 32, 53, 18, 55, 41, 28, 61, 35, 36, 52, 23, 19, 25, 73, 16, 32, 14, 31, 43, 34]) intercept, slope, corr_coeff = plot_linear_regression(X, y) plt.show()","title":"Example 1 - Ordinary Least Squares Simple Linear Regression"},{"location":"user_guide/plotting/plot_linear_regression/#api","text":"plot_linear_regression(X, y, model=LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False), corr_func='pearsonr', scattercolor='blue', fit_style='k--', legend=True, xlim='auto') Plot a linear regression line fit. Parameters X : numpy array, shape = [n_samples,] Samples. y : numpy array, shape (n_samples,) Target values model: object (default: sklearn.linear_model.LinearRegression) Estimator object for regression. Must implement a .fit() and .predict() method. corr_func: str or function (default: 'pearsonr') Uses pearsonr from scipy.stats if corr_func='pearsonr'. to compute the regression slope. If not 'pearsonr', the corr_func , the corr_func parameter expects a function of the form func( , ) as inputs, which is expected to return a tuple (,