sklearn
In Development
For a short description of the main highlights of the release, please refer to sphx_glr_auto_examples_release_highlights_plot_release_highlights_0_22_0.py
.
The following estimators and functions, when fit with the same data and parameters, may produce different models from the previous version. This often occurs due to changes in the modelling logic (bug fixes or enhancements), or in random sampling procedures.
cluster.KMeans
when n_jobs=1.decomposition.SparseCoder
,decomposition.DictionaryLearning
, anddecomposition.MiniBatchDictionaryLearning
decomposition.SparseCoder
with algorithm='lasso_lars'decomposition.SparsePCA
where normalize_components has no effect due to deprecation.ensemble.HistGradientBoostingClassifier
andensemble.HistGradientBoostingRegressor
, , .impute.IterativeImputer
when X has features with no missing values.linear_model.Ridge
when X is sparse.model_selection.StratifiedKFold
and any use of cv=int with a classifier.
Details are listed in the changelog below.
(While we are trying to better inform users by providing this information, we cannot assure that this list is complete.)
- From version 0.24
base.BaseEstimator.get_params
will raise an AttributeError rather than return None for parameters that are in the estimator's constructor but not stored as attributes on the instance.14464
by Joel Nothman.
- Fixed a bug that made
calibration.CalibratedClassifierCV
fail when given a sample_weight parameter of type list (in the case where sample_weights are not supported by the wrapped estimator).13575
byWilliam de Vazelhes <wdevazelhes>
.
cluster.SpectralClustering
now accepts precomputed sparse neighbors graph as input.10482
by Tom Dupre la Tour andKumar Ashutosh <thechargedneutron>
.cluster.SpectralClustering
now accepts an_components
parameter. This parameter extends SpectralClustering class functionality to matchcluster.spectral_clustering
.13726
byShuzhe Xiao <fdas3213>
.- Fixed a bug where
cluster.KMeans
produced inconsistent results between n_jobs=1 and n_jobs>1 due to the handling of the random state.9288
byBryan Yang <bryanyang0528>
. - Fixed a bug where elkan algorithm in
cluster.KMeans
was producing Segmentation Fault on large arrays due to integer index overflow.15057
byVladimir Korolev <balodja>
. ~cluster.MeanShift
now accepts amax_iter
with a default value of 300 instead of always using the default 300. It also now exposes ann_iter_
indicating the maximum number of iterations performed on each seed.15120
by Adrin Jalali.
- Fixed a bug in
compose.ColumnTransformer
which failed to select the proper columns when using a boolean list, with NumPy older than 1.12.14510
byGuillaume Lemaitre <glemaitre>
. - Fixed a bug in
compose.TransformedTargetRegressor
which did not pass **fit_params to the underlying regressor.14890
byMiguel Cabrera <mfcabrera>
.
- Fixed a bug where
cross_decomposition.PLSCanonical
andcross_decomposition.PLSRegression
were raising an error when fitted with a target matrix Y in which the first column was constant.13609
byCamila Williamson <camilaagw>
.
datasets.fetch_openml
now supports heterogeneous data using pandas by setting as_frame=True.13902
by Thomas Fan.- The parameter return_X_y was added to
datasets.fetch_20newsgroups
anddatasets.fetch_olivetti_faces
.14259
bySourav Singh <souravsingh>
. datasets.make_classification
now accepts array-like weights parameter, i.e. list or numpy.array, instead of list only.14764
byCat Chenal <CatChenal>
.- Fixed a bug in
datasets.fetch_openml
, which failed to load an OpenML dataset that contains an ignored feature.14623
bySarra Habchi <HabchiSarra>
.
decomposition.dict_learning()
anddecomposition.dict_learning_online()
now accept method_max_iter and pass it todecomposition.sparse_encode
.12650
by Adrin Jalali.decomposition.SparseCoder
,decomposition.DictionaryLearning
, anddecomposition.MiniBatchDictionaryLearning
now take a transform_max_iter parameter and pass it to eitherdecomposition.dict_learning()
ordecomposition.sparse_encode()
.12650
by Adrin Jalali.decomposition.IncrementalPCA
now accepts sparse matrices as input, converting them to dense in batches thereby avoiding the need to store the entire dense matrix at once.13960
byScott Gigante <scottgigante>
.decomposition.sparse_encode()
now passes the max_iter to the underlyinglinear_model.LassoLars
when algorithm='lasso_lars'.12650
by Adrin Jalali.
dummy.DummyClassifier
now handles checking the existence of the provided constant in multiouput cases.14908
byMartina G. Vilas <martinagvilas>
.- The
outputs_2d_
attribute is deprecated indummy.DummyClassifier
anddummy.DummyRegressor
. It is equivalent ton_outputs > 1
.14933
by Nicolas Hug
- Added
ensemble.StackingClassifier
andensemble.StackingRegressor
to stack predictors using a final classifier or regressor.11047
byGuillaume Lemaitre <glemaitre>
andCaio Oliveira <caioaao>
. Many improvements were made to
ensemble.HistGradientBoostingClassifier
andensemble.HistGradientBoostingRegressor
:- Estimators now natively support dense data with missing values both for training and predicting. They also support infinite values.
13911
and14406
by Nicolas Hug, Adrin Jalali and Olivier Grisel. - Estimators now have an additional warm_start parameter that enables warm starting.
14012
byJohann Faouzi <johannfaouzi>
. - for
ensemble.HistGradientBoostingClassifier
the training loss or score is now monitored on a class-wise stratified subsample to preserve the class balance of the original training set.14194
byJohann Faouzi <johannfaouzi>
. inspection.partial_dependence
andinspection.plot_partial_dependence
now support the fast 'recursion' method for both estimators.13769
by Nicolas Hug.ensemble.HistGradientBoostingRegressor
now supports the 'least_absolute_deviation' loss.13896
by Nicolas Hug.- Estimators now bin the training and validation data separately to avoid any data leak.
13933
by Nicolas Hug. - Fixed a bug where early stopping would break with string targets.
14710
byGuillaume Lemaitre <glemaitre>
. ensemble.HistGradientBoostingClassifier
now raises an error ifcategorical_crossentropy
loss is given for a binary classification problem.14869
by Adrin Jalali.
Note that pickles from 0.21 will not work in 0.22.
- Estimators now natively support dense data with missing values both for training and predicting. They also support infinite values.
ensemble.VotingClassifier.predict_proba
will no longer be present when voting='hard'.14287
by Thomas Fan.- Run by default
utils.estimator_checks.check_estimator
on bothensemble.VotingClassifier
andensemble.VotingRegressor
. It leads to solve issues regarding shape consistency during predict which was failing when the underlying estimators were not outputting consistent array dimensions. Note that it should be replaced by refactoring the common tests in the future.14305
byGuillaume Lemaitre <glemaitre>
. ensemble.AdaBoostClassifier
computes probabilities based on the decision function as in the literature. Thus, predict and predict_proba give consistent results.14114
byGuillaume Lemaitre <glemaitre>
.presort
is now deprecated inensemble.GradientBoostingClassifier
andensemble.GradientBoostingRegressor
, and the parameter has no effect. Users are recommended to useensemble.HistGradientBoostingClassifier
andensemble.HistGradientBoostingRegressor
instead.14907
by Adrin Jalali.- Addition of
max_samples
argument allows limiting size of bootstrap samples to be less than size of dataset. Added toensemble.ForestClassifier
,ensemble.ForestRegressor
,ensemble.RandomForestClassifier
,ensemble.RandomForestRegressor
,ensemble.ExtraTreesClassifier
,ensemble.ExtraTreesRegressor
,ensemble.RandomTreesEmbedding
.14682
byMatt Hancock <notmatthancock>
and5963
byPablo Duboue <DrDub>
. - Stacking and Voting estimators now ensure that their underlying estimators are either all classifiers or all regressors.
ensemble.StackingClassifier
,ensemble.StackingRegressor
, andensemble.VotingClassifier
andVotingRegressor
now raise consistent error messages.15084
byGuillaume Lemaitre <glemaitre>
.
- A warning will now be raised if a parameter choice means that another parameter will be unused on calling the fit() method for
feature_extraction.text.HashingVectorizer
,feature_extraction.text.CountVectorizer
andfeature_extraction.text.TfidfVectorizer
.14602
byGaurav Chawla <getgaurav2>
. - Functions created by
build_preprocessor
andbuild_analyzer
offeature_extraction.text.VectorizerMixin
can now be pickled.14430
byDillon Niederhut <deniederhut>
. - Deprecated unused copy param for
feature_extraction.text.TfidfVectorizer.transform
it will be removed in v0.24.14520
byGuillem G. Subies <guillemgsubies>
. feature_extraction.text.strip_accents_unicode
now correctly removes accents from strings that are in NFKD normalized form.15100
byDaniel Grady <DGrady>
.
- Fixed a bug where
feature_selection.VarianceThreshold
with threshold=0 did not remove constant features due to numerical instability, by using range rather than variance in this case.13704
byRoddy MacSween <rlms>
.
gaussian_process.GaussianProcessClassifier.log_marginal_likelihood
andgaussian_process.GaussianProcessRegressor.log_marginal_likelihood
now accept aclone_kernel=True
keyword argument. When set toFalse
, the kernel attribute is modified, but may result in a performance improvement.14378
byMasashi Shibata <c-bata>
.- From version 0.24
gaussian_process.kernels.Kernel.get_params
will raise anAttributeError
rather than returnNone
for parameters that are in the estimator's constructor but not stored as attributes on the instance.14464
by Joel Nothman.
- Added
impute.KNNImputer
, to impute missing values using k-Nearest Neighbors.12852
byAshim Bhattarai <ashimb9>
and Thomas Fan. impute.IterativeImputer
has new skip_compute flag that is False by default, which, when True, will skip computation on features that have no missing values during the fit phase.13773
bySergey Feldman <sergeyf>
.impute.IterativeImputer
now works when there is only one feature. BySergey Feldman <sergeyf>
.impute.MissingIndicator.fit_transform
avoid repeated computation of the masked matrix.14356
byHarsh Soni <harsh020>
.
inspection.permutation_importance
has been added to measure the importance of each feature in an arbitrary trained model with respect to a given scoring function.13146
by Thomas Fan.inspection.partial_dependence
andinspection.plot_partial_dependence
now support the fast 'recursion' method forensemble.HistGradientBoostingClassifier
andensemble.HistGradientBoostingRegressor
.13769
by Nicolas Hug.
- Fixed a bug where
kernel_approximation.Nystroem
raised a KeyError when using kernel="precomputed".14706
byVenkatachalam N <venkyyuvy>
.
linear_model.BayesianRidge
now accepts hyperparametersalpha_init
andlambda_init
which can be used to set the initial value of the maximization procedure infit
.13618
byYoshihiro Uchida <c56pony>
.- The 'liblinear' logistic regression solver is now faster and requires less memory.
14108
, pr:14170, pr:14296 byAlex Henrie <alexhenrie>
. linear_model.Ridge
now correctly fits an intercept when X is sparse, solver="auto" and fit_intercept=True, because the default solver in this configuration has changed to sparse_cg, which can fit an intercept with sparse data.13995
byJérôme Dockès <jeromedockes>
.linear_model.Ridge
with solver='sag' now accepts F-ordered and non-contiguous arrays and makes a conversion instead of failing.14458
byGuillaume Lemaitre <glemaitre>
.linear_model.LassoCV
no longer forcesprecompute=False
when fitting the final model.14591
by Andreas Müller.linear_model.RidgeCV
andlinear_model.RidgeClassifierCV
now correctly scores when cv=None.14864
byVenkatachalam N <venkyyuvy>
.- Fixed a bug in
linear_model.LogisticRegressionCV
where thescores_
,n_iter_
andcoefs_paths_
attribute would have a wrong ordering withpenalty='elastic-net'
.15044
by Nicolas Hug linear_model.MultiTaskLassoCV
andlinear_model.MultiTaskElasticNetCV
with X of dtype int and fit_intercept=True.15086
byAlex Gramfort <agramfort>
.
manifold.Isomap
,manifold.TSNE
, andmanifold.SpectralEmbedding
now accept precomputed sparse neighbors graph as input.10482
by Tom Dupre la Tour andKumar Ashutosh <thechargedneutron>
.- Exposed the
n_jobs
parameter inmanifold.TSNE
for multi-core calculation of the neighbors graph. This parameter has no impact whenmetric="precomputed"
or (metric="euclidean"
andmethod="exact"
).15082
by Roman Yurchak. - Deprecate
training_data_
unused attribute inmanifold.Isomap
.10482
by Tom Dupre la Tour. - Fixed a bug where
manifold.spectral_embedding
(and thereforemanifold.SpectralEmbedding
andcluster.SpectralClustering
) computed wrong eigenvalues witheigen_solver='amg'
whenn_samples < 5 * n_components
.14647
by Andreas Müller. - Fixed a bug in
manifold.spectral_embedding
used inmanifold.SpectralEmbedding
andcluster.SpectralClustering
whereeigen_solver="amg"
would sometimes result in a LinAlgError.13393
byAndrew Knyazev <lobpcg>
13707
byScott White <whitews>
metrics.plot_roc_curve
has been added to plot roc curves. This function introduces the visualization API described in theUser Guide <visualizations>
.14357
by Thomas Fan.- Added the
metrics.pairwise.nan_euclidean_distances
metric, which calculates euclidean distances in the presence of missing values.12852
byAshim Bhattarai <ashimb9>
and Thomas Fan. - New ranking metrics
metrics.ndcg_score
andmetrics.dcg_score
have been added to compute Discounted Cumulative Gain and Normalized Discounted Cumulative Gain.9951
byJérôme Dockès <jeromedockes>
. - Added multiclass support to
metrics.roc_auc_score
.12789
byKathy Chen <kathyxchen>
,Mohamed Maskani <maskani-moh>
, andThomas Fan <thomasjpfan>
. - Add
metrics.mean_tweedie_deviance
measuring the Tweedie deviance for a power parameterpower
. Also add mean Poisson deviancemetrics.mean_poisson_deviance
and mean Gamma deviancemetrics.mean_gamma_deviance
that are special cases of the Tweedie deviance for power=1 and power=2 respectively.13938
byChristian Lorentzen <lorentzenchr>
and Roman Yurchak. - The parameter
beta
inmetrics.fbeta_score
is updated to accept the zero and float('+inf') value.13231
byDong-hee Na <corona10>
. - Added parameter
squared
inmetrics.mean_squared_error
to return root mean squared error.13467
byUrvang Patel <urvang96>
. - Allow computing averaged metrics in the case of no true positives.
14595
by Andreas Müller. - Raise a ValueError in
metrics.silhouette_score
when a precomputed distance matrix contains non-zero diagonal entries.12258
byStephen Tierney <sjtrny>
. scoring="neg_brier_score"
should be used instead ofscoring="brier_score_loss"
which is now deprecated.14898
byStefan Matcovici <stefan-matcovici>
.- Improved performance of
metrics.pairwise.manhattan_distances
in the case of sparse matrices.15049
by Paolo Toccaceli <ptocca>.
- Improved performance of multimetric scoring in
model_selection.cross_validate
,model_selection.GridSearchCV
, andmodel_selection.RandomizedSearchCV
.14593
by Thomas Fan. model_selection.learning_curve
now accepts parameterreturn_times
which can be used to retrieve computation times in order to plot model scalability (see learning_curve example).13938
byHadrien Reboul <H4dr1en>
.model_selection.RandomizedSearchCV
now accepts lists of parameter distributions.14549
by Andreas Müller.- Reimplemented
model_selection.StratifiedKFold
to fix an issue where one test set could be n_classes larger than another. Test sets should now be near-equally sized.14704
by Joel Nothman.
multioutput.MultiOutputClassifier
now has attributeclasses_
.14629
byAgamemnon Krasoulis <agamemnonc>
.
- Added
naive_bayes.CategoricalNB
that implements the Categorical Naive Bayes classifier.12569
byTim Bicker <timbicker>
andFlorian Wilhelm <FlorianWilhelm>
.
- Added
neighbors.KNeighborsTransformer
andneighbors.RadiusNeighborsTransformer
, which transform input dataset into a sparse neighbors graph. They give finer control on nearest neighbors computations and enable easy pipeline caching for multiple use.10482
by Tom Dupre la Tour. neighbors.KNeighborsClassifier
,neighbors.KNeighborsRegressor
,neighbors.RadiusNeighborsClassifier
,neighbors.RadiusNeighborsRegressor
, andneighbors.LocalOutlierFactor
now accept precomputed sparse neighbors graph as input.10482
by Tom Dupre la Tour andKumar Ashutosh <thechargedneutron>
.neighbors.RadiusNeighborsClassifier
now supports predicting probabilities by using predict_proba and supports more outlier_label options: 'most_frequent', or different outlier_labels for multi-outputs.9597
byWenbo Zhao <webber26232>
.- Efficiency improvements for
neighbors.RadiusNeighborsClassifier.predict
.9597
byWenbo Zhao <webber26232>
. neighbors.KNeighborsRegressor
now throws error when metric='precomputed' and fit on non-square data.14336
byGregory Dexter <gdex1>
.
- Add max_fun parameter in
neural_network.BaseMultilayerPerceptron
,neural_network.MLPRegressor
, andneural_network.MLPClassifier
to give control over maximum number of function evaluation to not meettol
improvement.9274
byDaniel Perry <daniel-perry>
.
pipeline.Pipeline
now supportsscore_samples
if the final estimator does.13806
byAnaël Beaugnon <ab-anssi>
.- None as a transformer is now deprecated in
pipeline.FeatureUnion
. Please use 'drop' instead.15053
by Thomas Fan. - The fit in
~pipeline.FeatureUnion
now accepts fit_params to pass to the underlying transformers.15119
by Adrin Jalali.
- Avoid unnecessary data copy when fitting preprocessors
preprocessing.StandardScaler
,preprocessing.MinMaxScaler
,preprocessing.MaxAbsScaler
,preprocessing.RobustScaler
andpreprocessing.QuantileTransformer
which results in a slight performance improvement.13987
by Roman Yurchak. - KernelCenterer now throws error when fit on non-square
preprocessing.KernelCenterer
14336
byGregory Dexter <gdex1>
.
svm.SVC
andsvm.NuSVC
now accept abreak_ties
parameter. This parameter results inpredict
breaking the ties according to the confidence values ofdecision_function
, ifdecision_function_shape='ovr'
, and the number of target classes > 2.12557
by Adrin Jalali.- SVM estimators now throw a more specific error when kernel='precomputed' and fit on non-square data.
14336
byGregory Dexter <gdex1>
. svm.SVC
,svm.SVR
,svm.NuSVR
andsvm.OneClassSVM
when received values negative or zero for parametersample_weight
in method fit(), generated an invalid model. This behavior occured only in some border scenarios. Now in these cases, fit() will fail with an Exception.14286
byAlex Shacked <alexshacked>
.- The n_support_ attribute of
svm.SVR
andsvm.OneClassSVM
was previously non-initialized, and had size 2. It has now size 1 with the correct value.15099
by Nicolas Hug. - fixed a bug in
BaseLibSVM._sparse_fit
where n_SV=0 raised a ZeroDivisionError.14894
byDanna Naser <danna-naser>
.
- Adds minimal cost complexity pruning, controlled by
ccp_alpha
, totree.DecisionTreeClassifier
,tree.DecisionTreeRegressor
,tree.ExtraTreeClassifier
,tree.ExtraTreeRegressor
,ensemble.RandomForestClassifier
,ensemble.RandomForestRegressor
,ensemble.ExtraTreesClassifier
,ensemble.ExtraTreesRegressor
,ensemble.RandomTreesEmbedding
,ensemble.GradientBoostingClassifier
, andensemble.GradientBoostingRegressor
.12887
by Thomas Fan. presort
is now deprecated intree.DecisionTreeClassifier
andtree.DecisionTreeRegressor
, and the parameter has no effect.14907
by Adrin Jalali.- The
classes_
andn_classes_
attributes oftree.DecisionTreeRegressor
are now deprecated.15028
byMei Guan <meiguan>
, Nicolas Hug, and Adrin Jalali.
~utils.estimator_checks.check_estimator
can now generate checks by setting generate_only=True. Previously, running~utils.estimator_checks.check_estimator
will stop when the first check fails. With generate_only=True, all checks can run independently and report the ones that are failing. Read more inrolling_your_own_estimator
.14381
by Thomas Fan.- Added a pytest specific decorator,
~utils.estimator_checks.parametrize_with_checks
, to parametrize estimator checks for a list of estimators.14381
by Thomas Fan. - The following utils have been deprecated and are now private:
choose_check_classifiers_labels
enforce_estimator_tags_y
- `optimize.newton_cg
random.random_choice_csc
safe_indexing
- A new random variable,
utils.fixes.loguniform
implements a log-uniform random variable (e.g., for use in RandomizedSearchCV). For example, the outcomes1
,10
and100
are all equally likely forloguniform(1, 100)
. See11232
byScott Sievert <stsievert>
andNathaniel Saul <sauln>
, and SciPy PR 10815 <scipy/scipy#10815>. utils.safe_indexing
(now deprecated) accepts anaxis
parameter to index array-like across rows and columns. The column indexing can be done on NumPy array, SciPy sparse matrix, and Pandas DataFrame. An additional refactoring was done.14035
and14475
byGuillaume Lemaitre <glemaitre>
.utils.extmath.safe_sparse_dot
works between 3D+ ndarray and sparse matrix.14538
byJérémie du Boisberranger <jeremiedbb>
.utils.check_array
is now raising an error instead of casting NaN to integer.14872
by Roman Yurchak.utils.check_array
will now correctly detect numeric dtypes in pandas dataframes, fixing a bug wherefloat32
was upcast tofloat64
unnecessarily.15094
by Andreas Müller.- The following utils have been deprecated and are now private:
choose_check_classifiers_labels
enforce_estimator_tags_y
mocking.MockDataFrame
mocking.CheckingClassifier
- `optimize.newton_cg
random.random_choice_csc
- Fixed a bug where
isotonic.IsotonicRegression.fit
raised error when X.dtype == 'float32' and X.dtype != y.dtype.14902
byLucas <lostcoaster>
.
- Replace manual checks with
check_is_fitted
. Errors thrown when using a non-fitted estimators are now more uniform.13013
byAgamemnon Krasoulis <agamemnonc>
. - Port lobpcg from SciPy which implement some bug fixes but only available in 1.3+.
13609
byGuillaume Lemaitre <glemaitre>
.
These changes mostly affect library developers.
- Estimators are now expected to raise a
NotFittedError
ifpredict
ortransform
is called beforefit
; previously anAttributeError
orValueError
was acceptable.13013
by byAgamemnon Krasoulis <agamemnonc>
. - Binary only classifiers are now supported in estimator checks. Such classifiers need to have the binary_only=True estimator tag.
13875
by Trevor Stephens. - requires_positive_X estimator tag (for models that require X to be non-negative) is now used by
utils.estimator_checks.check_estimator
to make sure a proper error message is raised if X contains some negative entries.14680
byAlex Gramfort <agramfort>
. - Added check that pairwise estimators raise error on non-square data
14336
byGregory Dexter <gdex1>
. - Added two common multioutput estimator tests
~utils.estimator_checks.check_classifier_multioutput
and~utils.estimator_checks.check_regressor_multioutput
.13392
byRok Mihevc <rok>
. - Added
check_transformer_data_not_an_array
to checks where missing