The CHANGELOG for the current development version is available at https://github.com/rasbt/mlxtend/blob/master/docs/sources/CHANGELOG.md.
- A meaningful error message is now raised when a cross-validation generator is used with
SequentialFeatureSelector
. (#377) - The
SequentialFeatureSelector
now accepts custom feature names via thefit
method for more interpretable feature subset reports. (#379) - The
SequentialFeatureSelector
is now also compatible with Pandas DataFrames and uses DataFrame column-names for more interpretable feature subset reports. (#379) ColumnSelector
now works with Pandas DataFrames columns. (#378 by Manuel Garrido)- The
ExhaustiveFeatureSelector
estimator inmlxtend.feature_selection
now is safely stoppable mid-process by control+c. (#380)
- For concistency, the
best_idx_
attribute of theExhaustiveFeatureSelector
was renamed tok_feature_idx_
, which is used by theSequentialFeatureSelector
. Likewise,best_score_
was renamed tok_score_
. (#380)
- Allow mlxtend estimators to be cloned via scikit-learn's
clone
function. (#374)
- A new
feature_importance_permuation
function to compute the feature importance in classifiers and regressors via the permutation importance method (#358) - The fit method of the
ExhaustiveFeatureSelector
now optionally accepts**fit_params
for the estimator that is used for the feature selection. (#354 by Zach Griffith) - The fit method of the
SequentialFeatureSelector
now optionally accepts**fit_params
for the estimator that is used for the feature selection. (#350 by Zach Griffith)
- Replaced
plot_decision_regions
colors by a colorblind-friendly palette and adds contour lines for decision regions. (#348) - All stacking estimators now raise
NonFittedErrors
if any method for inference is called prior to fitting the estimator. (#353) - Renamed the
refit
parameter of both theStackingClassifier
andStackingCVClassifier
touse_clones
to be more explicit and less misleading. (#368)
- Various changes in the documentation and documentation tools to fix formatting issues (#363)
- Fixes a bug where the
StackingCVClassifier
's meta features were not stored in the original order whenshuffle=True
(#370) - Many documentation improvements, including links to the User Guides in the API docs (#371)
- New function implementing the resampled paired t-test procedure (
paired_ttest_resampled
) to compare the performance of two models. (#323) - New function implementing the k-fold paired t-test procedure (
paired_ttest_kfold_cv
) to compare the performance of two models (also called k-hold-out paired t-test). (#324) - New function implementing the 5x2cv paired t-test procedure (
paired_ttest_5x2cv
) proposed by Dieterrich (1998) to compare the performance of two models. (#325) - A
refit
parameter was added to stacking classes (similar to therefit
parameter in theEnsembleVoteClassifier
), to support classifiers and regressors that follow the scikit-learn API but are not compatible with scikit-learn'sclone
function. (#322) - The
ColumnSelector
now has adrop_axis
argument to use it in pipelines withCountVectorizers
. (#333)
- Raises an informative error message if
predict
orpredict_meta_features
is called prior to calling thefit
method inStackingRegressor
andStackingCVRegressor
. (#315) - The
plot_decision_regions
function now automatically determines the optimal setting based on the feature dimensions and supports anti-aliasing. The oldres
parameter has been deprecated. (#309 by Guillaume Poirier-Morency) - Apriori code is faster due to optimization in
onehot transformation
and the amount of candidates generated by theapriori
algorithm. (#327 by Jakub Smid) - The
OnehotTransactions
class (which is typically often used in combination with theapriori
function for association rule mining) is now more memory efficient as it uses boolean arrays instead of integer arrays. In addition, theOnehotTransactions
class can be now be provided withsparse
argument to generate sparse representations of theonehot
matrix to further improve memory efficiency. (#328 by Jakub Smid) - The
OneHotTransactions
has been deprecated and replaced by theTransactionEncoder
. (#332 - The
plot_decision_regions
function now has three new parameters,scatter_kwargs
,contourf_kwargs
, andscatter_highlight_kwargs
, that can be used to modify the plotting style. (#342 by James Bourbeau)
- Fixed issue when class labels were provided to the
EnsembleVoteClassifier
whenrefit
was set tofalse
. (#322) - Allow arrays with 16-bit and 32-bit precision in
plot_decision_regions
function. (#337) - Fixed bug that raised an indexing error if the number of items was <= 1 when computing association rules using the conviction metric. (#340)
- New
store_train_meta_features
parameter forfit
in StackingCVRegressor. if True, train meta-features are stored inself.train_meta_features_
. Newpred_meta_features
method forStackingCVRegressor
. People can get test meta-features using this method. (#294 via takashioya) - The new
store_train_meta_features
attribute andpred_meta_features
method for theStackingCVRegressor
were also added to theStackingRegressor
,StackingClassifier
, andStackingCVClassifier
(#299 & #300) - New function (
evaluate.mcnemar_tables
) for creating multiple 2x2 contigency from model predictions arrays that can be used in multiple McNemar (post-hoc) tests or Cochran's Q or F tests, etc. (#307) - New function (
evaluate.cochrans_q
) for performing Cochran's Q test to compare the accuracy of multiple classifiers. (#310)
- Added
requirements.txt
tosetup.py
. (#304 via Colin Carrol)
- Improved numerical stability for p-values computed via the the exact McNemar test (#306)
nose
is not required to use the library (#302)
- Added
mlxtend.evaluate.bootstrap_point632_score
to evaluate the performance of estimators using the .632 bootstrap. (#283) - New
max_len
parameter for the frequent itemset generation via theapriori
function to allow for early stopping. (#270)
- All feature index tuples in
SequentialFeatureSelector
or now in sorted order. (#262) - The
SequentialFeatureSelector
now runs the continuation of the floating inclusion/exclusion as described in Novovicova & Kittler (1994). Note that this didn't cause any difference in performance on any of the test scenarios but could lead to better performance in certain edge cases. (#262) utils.Counter
now accepts a name variable to help distinguish between multiple counters, time precision can be set with the 'precision' kwarg and the new attribute end_time holds the time the last iteration completed. (#278 via Mathew Savage)
- Fixed an deprecation error that occured with McNemar test when using SciPy 1.0. (#283)
- Added
evaluate.permutation_test
, a permutation test for hypothesis testing (or A/B testing) to test if two samples come from the same distribution. Or in other words, a procedure to test the null hypothesis that that two groups are not significantly different (e.g., a treatment and a control group). (#250) - Added
'leverage'
and'conviction
as evaluation metrics to thefrequent_patterns.association_rules
function. (#246 & #247) - Added a
loadings_
attribute toPrincipalComponentAnalysis
to compute the factor loadings of the features on the principal components. (#251) - Allow grid search over classifiers/regressors in ensemble and stacking estimators. (#259)
- New
make_multiplexer_dataset
function that creates a dataset generated by a n-bit Boolean multiplexer for evaluating supervised learning algorithms. (#263) - Added a new
BootstrapOutOfBag
class, an implementation of the out-of-bag bootstrap to evaluate supervised learning algorithms. (#265) - The parameters for
StackingClassifier
,StackingCVClassifier
,StackingRegressor
,StackingCVRegressor
, andEnsembleVoteClassifier
can now be tuned using scikit-learn'sGridSearchCV
(#254 via James Bourbeau)
- The
'support'
column returned byfrequent_patterns.association_rules
was changed to compute the support of "antecedant union consequent", and newantecedant support'
and'consequent support'
column were added to avoid ambiguity. (#245) - Allow the
OnehotTransactions
to be cloned via scikit-learn'sclone
function, which is required by e.g., scikit-learn'sFeatureUnion
orGridSearchCV
(via Iaroslav Shcherbatyi). (#249)
- Fix issues with
self._init_time
parameter in_IterativeModel
subclasses. (#256) - Fix imprecision bug that occurred in
plot_ecdf
when run on Python 2.7. (264) - The vectors from SVD in
PrincipalComponentAnalysis
are now being scaled so that the eigenvalues viasolver='eigen'
andsolver='svd'
now store eigenvalues that have the same magnitudes. (#251)
- Added a
mlxtend.evaluate.bootstrap
that implements the ordinary nonparametric bootstrap to bootstrap a single statistic (for example, the mean. median, R^2 of a regression fit, and so forth) #232 SequentialFeatureSelecor
'sk_features
now accepts a string argument "best" or "parsimonious" for more "automated" feature selection. For instance, if "best" is provided, the feature selector will return the feature subset with the best cross-validation performance. If "parsimonious" is provided as an argument, the smallest feature subset that is within one standard error of the cross-validation performance will be selected. #238
SequentialFeatureSelector
now usesnp.nanmean
over normal mean to support scorers that may returnnp.nan
#211 (via mrkaiser)- The
skip_if_stuck
parameter was removed fromSequentialFeatureSelector
in favor of a more efficient implementation comparing the conditional inclusion/exclusion results (in the floating versions) to the performances of previously sampled feature sets that were cached #237 ExhaustiveFeatureSelector
was modified to consume substantially less memory #195 (via Adam Erickson)
- Fixed a bug where the
SequentialFeatureSelector
selected a feature subset larger than then specified via thek_features
tuple max-value #213
- New mlxtend.plotting.ecdf function for plotting empirical cumulative distribution functions (#196).
- New
StackingCVRegressor
for stacking regressors with out-of-fold predictions to prevent overfitting (#201via Eike Dehling).
- The TensorFlow estimator have been removed from mlxtend, since TensorFlow has now very convenient ways to build on estimators, which render those implementations obsolete.
plot_decision_regions
now supports plotting decision regions for more than 2 training features #189, via James Bourbeau).- Parallel execution in
mlxtend.feature_selection.SequentialFeatureSelector
andmlxtend.feature_selection.ExhaustiveFeatureSelector
is now performed over different feature subsets instead of the different cross-validation folds to better utilize machines with multiple processors if the number of features is large (#193, via @whalebot-helmsman). - Raise meaningful error messages if pandas
DataFrame
s or Python lists of lists are fed into theStackingCVClassifer
as afit
arguments (198). - The
n_folds
parameter of theStackingCVClassifier
was changed tocv
and can now accept any kind of cross validation technique that is available from scikit-learn. For example,StackingCVClassifier(..., cv=StratifiedKFold(n_splits=3))
orStackingCVClassifier(..., cv=GroupKFold(n_splits=3))
(#203, via Konstantinos Paliouras).
SequentialFeatureSelector
now correctly accepts aNone
argument for thescoring
parameter to infer the default scoring metric from scikit-learn classifiers and regressors (#171).- The
plot_decision_regions
function now supports pre-existing axes objects generated via matplotlib'splt.subplots
. (#184, see example) - Made
math.num_combinations
andmath.num_permutations
numerically stable for large numbers of combinations and permutations (#200).
- An
association_rules
function is implemented that allows to generate rules based on a list of frequent itemsets (via Joshua Goerner).
- Adds a black
edgecolor
to plots viaplotting.plot_decision_regions
to make markers more distinguishable from the background inmatplotlib>=2.0
. - The
association
submodule was renamed tofrequent_patterns
.
- The
DataFrame
index ofapriori
results are now unique and ordered. - Fixed typos in autompg and wine datasets (via James Bourbeau).
- The
EnsembleVoteClassifier
has a newrefit
attribute that prevents refitting classifiers ifrefit=False
to save computational time. - Added a new
lift_score
function inevaluate
to compute lift score (via Batuhan Bardak). StackingClassifier
andStackingRegressor
support multivariate targets if the underlying models do (via kernc).StackingClassifier
has a newuse_features_in_secondary
attribute likeStackingCVClassifier
.
- Changed default verbosity level in
SequentialFeatureSelector
to 0 - The
EnsembleVoteClassifier
now raises aNotFittedError
if the estimator wasn'tfit
before callingpredict
. (via Anton Loss) - Added new TensorFlow variable initialization syntax to guarantee compatibility with TensorFlow 1.0
- Fixed wrong default value for
k_features
inSequentialFeatureSelector
- Cast selected feature subsets in the
SequentialFeautureSelector
as sets to prevent the iterator from getting stuck if thek_idx
are different permutations of the same combination (via Zac Wellmer). - Fixed an issue with learning curves that caused the performance metrics to be reversed (via ipashchenko)
- Fixed a bug that could occur in the
SequentialFeatureSelector
if there are similarly-well performing subsets in the floating variants (via Zac Wellmer).
- New
ExhaustiveFeatureSelector
estimator inmlxtend.feature_selection
for evaluating all feature combinations in a specified range - The
StackingClassifier
has a new parameteraverage_probas
that is set toTrue
by default to maintain the current behavior. A deprecation warning was added though, and it will default toFalse
in future releases (0.6.0);average_probas=False
will result in stacking of the level-1 predicted probabilities rather than averaging these. - New
StackingCVClassifier
estimator in 'mlxtend.classifier' for implementing a stacking ensemble that uses cross-validation techniques for training the meta-estimator to avoid overfitting (Reiichiro Nakano) - New
OnehotTransactions
encoder class added to thepreprocessing
submodule for transforming transaction data into a one-hot encoded array - The
SequentialFeatureSelector
estimator inmlxtend.feature_selection
now is safely stoppable mid-process by control+c, and deprecatedprint_progress
in favor of a more tunableverbose
parameter (Will McGinnis) - New
apriori
function inassociation
to extract frequent itemsets from transaction data for association rule mining - New
checkerboard_plot
function inplotting
to plot checkerboard tables / heat maps - New
mcnemar_table
andmcnemar
functions inevaluate
to compute 2x2 contingency tables and McNemar's test
- All plotting functions have been moved to
mlxtend.plotting
for compatibility reasons with continuous integration services and to make the installation ofmatplotlib
optional for users ofmlxtend
's core functionality - Added a compatibility layer for
scikit-learn 0.18
using the newmodel_selection
module while maintaining backwards compatibility to scikit-learn 0.17.
mlxtend.plotting.plot_decision_regions
now draws decision regions correctly if more than 4 class labels are present- Raise
AttributeError
inplot_decision_regions
when theX_higlight
argument is a 1D array (chkoar)
- Added
preprocessing.CopyTransformer
, a mock class that returns copies of imput arrays viatransform
andfit_transform
- Added AppVeyor to CI to ensure MS Windows compatibility
- Dataset are now saved as compressed .txt or .csv files rather than being imported as Python objects
feature_selection.SequentialFeatureSelector
now supports the selection ofk_features
using a tuple to specify a "min-max"k_features
range- Added "SVD solver" option to the
PrincipalComponentAnalysis
- Raise a
AttributeError
with "not fitted" message inSequentialFeatureSelector
iftransform
orget_metric_dict
are called prior tofit
- Use small, positive bias units in
TfMultiLayerPerceptron
's hidden layer(s) if the activations are ReLUs in order to avoid dead neurons - Added an optional
clone_estimator
parameter to theSequentialFeatureSelector
that defaults toTrue
, avoiding the modification of the original estimator objects - More rigorous type and shape checks in the
evaluate.plot_decision_regions
function DenseTransformer
now doesn't raise and error if the input array is not sparse- API clean-up using scikit-learn's
BaseEstimator
as parent class forfeature_selection.ColumnSelector
- Fixed a problem when a tuple-range was provided as argument to the
SequentialFeatureSelector
'sk_features
parameter and the scoring metric was more negative than -1 (e.g., as in scikit-learn's MSE scoring function) (wahutch](https://github.com/wahutch)) - Fixed an
AttributeError
issue whenverbose
> 1 inStackingClassifier
- Fixed a bug in
classifier.SoftmaxRegression
where the mean values of the offsets were used to update the bias units rather than their sum - Fixed rare bug in MLP
_layer_mapping
functions that caused a swap between the random number generation seed when initializing weights and biases
- New TensorFlow estimator for Linear Regression (
tf_regressor.TfLinearRegression
) - New k-means clustering estimator (
cluster.Kmeans
) - New TensorFlow k-means clustering estimator (
tf_cluster.Kmeans
)
- Due to refactoring of the estimator classes, the
init_weights
parameter of thefit
methods was globally renamed toinit_params
- Overall performance improvements of estimators due to code clean-up and refactoring
- Added several additional checks for correct array types and more meaningful exception messages
- Added optional
dropout
to thetf_classifier.TfMultiLayerPerceptron
classifier for regularization - Added an optional
decay
parameter to thetf_classifier.TfMultiLayerPerceptron
classifier for adaptive learning via an exponential decay of the learning rate eta - Replaced old
NeuralNetMLP
by more streamlinedMultiLayerPerceptron
(classifier.MultiLayerPerceptron
); now also with softmax in the output layer and categorical cross-entropy loss. - Unified
init_params
parameter for fit functions to continue training where the algorithm left off (if supported)
- New
TfSoftmaxRegression
classifier using Tensorflow (tf_classifier.TfSoftmaxRegression
) - New
SoftmaxRegression
classifier (classifier.SoftmaxRegression
) - New
TfMultiLayerPerceptron
classifier using Tensorflow (tf_classifier.TfMultiLayerPerceptron
) - New
StackingRegressor
(regressor.StackingRegressor
) - New
StackingClassifier
(classifier.StackingClassifier
) - New function for one-hot encoding of class labels (
preprocessing.one_hot
) - Added
GridSearch
support to theSequentialFeatureSelector
(feature_selection/.SequentialFeatureSelector
) evaluate.plot_decision_regions
improvements:- Function now handles class y-class labels correctly if array is of type
float
- Correct handling of input arguments
markers
andcolors
- Accept an existing
Axes
via theax
argument
- Function now handles class y-class labels correctly if array is of type
- New
print_progress
parameter for all generalized models and multi-layer neural networks for printing time elapsed, ETA, and the current cost of the current epoch - Minibatch learning for
classifier.LogisticRegression
,classifier.Adaline
, andregressor.LinearRegression
plus streamlined API - New Principal Component Analysis class via
mlxtend.feature_extraction.PrincipalComponentAnalysis
- New RBF Kernel Principal Component Analysis class via
mlxtend.feature_extraction.RBFKernelPCA
- New Linear Discriminant Analysis class via
mlxtend.feature_extraction.LinearDiscriminantAnalysis
- The
column
parameter inmlxtend.preprocessing.standardize
now defaults toNone
to standardize all columns more conveniently
- Added a progress bar tracker to
classifier.NeuralNetMLP
- Added a function to score predicted vs. target class labels
evaluate.scoring
- Added confusion matrix functions to create (
evaluate.confusion_matrix
) and plot (evaluate.plot_confusion_matrix
) confusion matrices - New style parameter and improved axis scaling in
mlxtend.evaluate.plot_learning_curves
- Added
loadlocal_mnist
tomlxtend.data
for streaming MNIST from a local byte files into numpy arrays - New
NeuralNetMLP
parameters:random_weights
,shuffle_init
,shuffle_epoch
- New
SFS
features such as the generation of pandasDataFrame
results tables and plotting functions (with confidence intervals, standard deviation, and standard error bars) - Added support for regression estimators in
SFS
- Added Boston
housing dataset
- New
shuffle
parameter forclassifier.NeuralNetMLP
- The
mlxtend.preprocessing.standardize
function now optionally returns the parameters, which are estimated from the array, for re-use. A further improvement makes thestandardize
function smarter in order to avoid zero-division errors - Cosmetic improvements to the
evaluate.plot_decision_regions
function such as hiding plot axes - Renaming of
classifier.EnsembleClassfier
toclassifier.EnsembleVoteClassifier
- Improved random weight initialization in
Perceptron
,Adaline
,LinearRegression
, andLogisticRegression
- Changed
learning
parameter ofmlxtend.classifier.Adaline
tosolver
and added "normal equation" as closed-form solution solver - Hide y-axis labels in
mlxtend.evaluate.plot_decision_regions
in 1 dimensional evaluations - Sequential Feature Selection algorithms were unified into a single
SequentialFeatureSelector
class with parameters to enable floating selection and toggle between forward and backward selection. - Stratified sampling of MNIST (now 500x random samples from each of the 10 digit categories)
- Renaming
mlxtend.plotting
tomlxtend.general_plotting
in order to distinguish general plotting function from specialized utility function such asevaluate.plot_decision_regions
- Sequential Feature Selection algorithms: SFS, SFFS, SBS, and SFBS
- Changed
regularization
&lambda
parameters inLogisticRegression
to single parameterl2_lambda
- API changes:
mlxtend.sklearn.EnsembleClassifier
->mlxtend.classifier.EnsembleClassifier
mlxtend.sklearn.ColumnSelector
->mlxtend.feature_selection.ColumnSelector
mlxtend.sklearn.DenseTransformer
->mlxtend.preprocessing.DenseTransformer
mlxtend.pandas.standardizing
->mlxtend.preprocessing.standardizing
mlxtend.pandas.minmax_scaling
->mlxtend.preprocessing.minmax_scaling
mlxtend.matplotlib
->mlxtend.plotting
- Added momentum learning parameter (alpha coefficient) to
mlxtend.classifier.NeuralNetMLP
. - Added adaptive learning rate (decrease constant) to
mlxtend.classifier.NeuralNetMLP
. mlxtend.pandas.minmax_scaling
becamemlxtend.preprocessing.minmax_scaling
and also supports NumPy arrays nowmlxtend.pandas.standardizing
becamemlxtend.preprocessing.standardizing
and now supports both NumPy arrays and pandas DataFrames; also, nowddof
parameters to set the degrees of freedom when calculating the standard deviation
- Added multilayer perceptron (feedforward artificial neural network) classifier as
mlxtend.classifier.NeuralNetMLP
. - Added 5000 labeled trainingsamples from the MNIST handwritten digits dataset to
mlxtend.data
- Added ordinary least square regression using different solvers (gradient and stochastic gradient descent, and the closed form solution (normal equation)
- Added option for random weight initialization to logistic regression classifier and updated l2 regularization
- Added
wine
dataset tomlxtend.data
- Added
invert_axes
parametermlxtend.matplotlib.enrichtment_plot
to optionally plot the "Count" on the x-axis - New
verbose
parameter formlxtend.sklearn.EnsembleClassifier
by Alejandro C. Bahnsen - Added
mlxtend.pandas.standardizing
to standardize columns in a Pandas DataFrame - Added parameters
linestyles
andmarkers
tomlxtend.matplotlib.enrichment_plot
mlxtend.regression.lin_regplot
automatically adds np.newaxis and works w. python lists- Added tokenizers:
mlxtend.text.extract_emoticons
andmlxtend.text.extract_words_and_emoticons
- Added Sequential Backward Selection (mlxtend.sklearn.SBS)
- Added
X_highlight
parameter tomlxtend.evaluate.plot_decision_regions
for highlighting test data points. - Added mlxtend.regression.lin_regplot to plot the fitted line from linear regression.
- Added mlxtend.matplotlib.stacked_barplot to conveniently produce stacked barplots using pandas
DataFrame
s. - Added mlxtend.matplotlib.enrichment_plot
- Added
scoring
tomlxtend.evaluate.learning_curves
(by user pfsq) - Fixed setup.py bug caused by the missing README.html file
- matplotlib.category_scatter for pandas DataFrames and Numpy arrays
- Added Logistic regression
- Gradient descent and stochastic gradient descent perceptron was changed to Adaline (Adaptive Linear Neuron)
- Perceptron and Adaline for {0, 1} classes
- Added
mlxtend.preprocessing.shuffle_arrays_unison
function to shuffle one or more NumPy arrays. - Added shuffle and random seed parameter to stochastic gradient descent classifier.
- Added
rstrip
parameter tomlxtend.file_io.find_filegroups
to allow trimming of base names. - Added
ignore_substring
parameter tomlxtend.file_io.find_filegroups
andfind_files
. - Replaced .rstrip in
mlxtend.file_io.find_filegroups
with more robust regex. - Gridsearch support for
mlxtend.sklearn.EnsembleClassifier
- Improved robustness of EnsembleClassifier.
- Extended plot_decision_regions() functionality for plotting 1D decision boundaries.
- Function matplotlib.plot_decision_regions was reorganized to evaluate.plot_decision_regions .
- evaluate.plot_learning_curves() function added.
- Added Rosenblatt, gradient descent, and stochastic gradient descent perceptrons.
- Added mlxtend.pandas.minmax_scaling - a function to rescale pandas DataFrame columns.
- Slight update to the EnsembleClassifier interface (additional
voting
parameter) - Fixed EnsembleClassifier to return correct class labels if class labels are not integers from 0 to n.
- Added new matplotlib function to plot decision regions of classifiers.
- Improved mlxtend.text.generalize_duplcheck to remove duplicates and prevent endless looping issue.
- Added
recursive
search parameter to mlxtend.file_io.find_files. - Added
check_ext
parameter mlxtend.file_io.find_files to search based on file extensions. - Default parameter to ignore invisible files for mlxtend.file_io.find.
- Added
transform
andfit_transform
to theEnsembleClassifier
. - Added mlxtend.file_io.find_filegroups function.
- Implemented scikit-learn EnsembleClassifier (majority voting rule) class.
- Improvements to mlxtend.text.generalize_names to handle certain Dutch last name prefixes (van, van der, de, etc.).
- Added mlxtend.text.generalize_name_duplcheck function to apply mlxtend.text.generalize_names function to a pandas DataFrame without creating duplicates.
- Added text utilities with name generalization function.
- Added and file_io utilities.
- Added combinations and permutations estimators.
- Added
DenseTransformer
for pipelines and grid search.
mean_centering
function is now a Class that createsMeanCenterer
objects that can be used to fit data via thefit
method, and center data at the column means via thetransform
andfit_transform
method.
- Added
preprocessing
module andmean_centering
function.
- Added
matplotlib
utilities andremove_borders
function.
- Simplified code for ColumnSelector.