Skip to content

Latest commit

 

History

History
247 lines (185 loc) · 9.59 KB

v0.22.rst

File metadata and controls

247 lines (185 loc) · 9.59 KB

sklearn

Version 0.22.0

In Development

Changed models

The following estimators and functions, when fit with the same data and parameters, may produce different models from the previous version. This often occurs due to changes in the modelling logic (bug fixes or enhancements), or in random sampling procedures.

  • decomposition.SparseCoder, decomposition.DictionaryLearning, and decomposition.MiniBatchDictionaryLearning
  • decomposition.SparseCoder with algorithm='lasso_lars'
  • decomposition.SparsePCA where normalize_components has no effect due to deprecation.
  • linear_model.Ridge when X is sparse.

Details are listed in the changelog below.

(While we are trying to better inform users by providing this information, we cannot assure that this list is complete.)

Changelog

sklearn.calibration

  • Fixed a bug that made calibration.CalibratedClassifierCV fail when given a sample_weight parameter of type list (in the case where sample_weights are not supported by the wrapped estimator). 13575 by William de Vazelhes <wdevazelhes>.

sklearn.datasets

  • datasets.fetch_openml now supports heterogeneous data using pandas by setting as_frame=True. 13902 by Thomas Fan.
  • The parameter return_X_y was added to datasets.fetch_20newsgroups and datasets.fetch_olivetti_faces . 14259 by Sourav Singh <souravsingh>.

sklearn.decomposition

  • decomposition.sparse_encode() now passes the max_iter to the underlying LassoLars when algorithm='lasso_lars'. 12650 by Adrin Jalali.
  • decomposition.dict_learning() and decomposition.dict_learning_online() now accept method_max_iter and pass it to sparse_encode. 12650 by Adrin Jalali.
  • decomposition.SparseCoder, decomposition.DictionaryLearning, and decomposition.MiniBatchDictionaryLearning now take a transform_max_iter parameter and pass it to either decomposition.dict_learning() or decomposition.sparse_encode(). 12650 by Adrin Jalali.
  • decomposition.IncrementalPCA now accepts sparse matrices as input, converting them to dense in batches thereby avoiding the need to store the entire dense matrix at once. 13960 by Scott Gigante <scottgigante>.

sklearn.ensemble

  • ensemble.HistGradientBoostingClassifier and ensemble.HistGradientBoostingRegressor have an additional parameter called warm_start that enables warm starting. 14012 by Johann Faouzi <johannfaouzi>.
  • ensemble.HistGradientBoostingClassifier and ensemble.HistGradientBoostingRegressor now bin the training and validation data separately to avoid any data leak. 13933 by Nicolas Hug.
  • ensemble.VotingClassifier.predict_proba will no longer be present when voting='hard'. 14287 by Thomas Fan.
  • ensemble.HistGradientBoostingClassifier the training loss or score is now monitored on a class-wise stratified subsample to preserve the class balance of the original training set. 14194 by Johann Faouzi <johannfaouzi>.
  • ensemble.AdaBoostClassifier computes probabilities based on the decision function as in the literature. Thus, predict and predict_proba give consistent results. 14114 by Guillaume Lemaitre <glemaitre>.

sklearn.linear_model

  • linear_model.BayesianRidge now accepts hyperparameters alpha_init and lambda_init which can be used to set the initial value of the maximization procedure in fit. 13618 by Yoshihiro Uchida <c56pony>.
  • linear_model.Ridge now correctly fits an intercept when X is sparse, solver="auto" and fit_intercept=True, because the default solver in this configuration has changed to sparse_cg, which can fit an intercept with sparse data. 13995 by Jérôme Dockès <jeromedockes>.
  • The 'liblinear' logistic regression solver is now faster and requires less memory. 14108, 14170 by Alex Henrie <alexhenrie>.
  • linear_model.Ridge with solver='sag' now accepts F-ordered arrays and make a conversion instead of failing. 14458 by Guillaume Lemaitre <glemaitre>.

sklearn.metrics

  • Added multiclass support to metrics.roc_auc_score. 12789 by Kathy Chen <kathyxchen>, Mohamed Maskani <maskani-moh>, and Thomas Fan <thomasjpfan>.
  • Add metrics.mean_tweedie_deviance measuring the Tweedie deviance for a power parameter p. Also add mean Poisson deviance metrics.mean_poisson_deviance and mean Gamma deviance metrics.mean_gamma_deviance that are special cases of the Tweedie deviance for p=1 and p=2 respectively. 13938 by Christian Lorentzen <lorentzenchr> and Roman Yurchak.
  • The parameter beta in metrics.fbeta_score is updated to accept the zero and float('+inf') value. 13231 by Dong-hee Na <corona10>.

sklearn.model_selection ..................

  • model_selection.learning_curve now accepts parameter return_times which can be used to retrieve computation times in order to plot model scalability (see learning_curve example). 13938 by Hadrien Reboul <H4dr1en>.

sklearn.pipeline

  • pipeline.Pipeline now supports score_samples if the final estimator does. 13806 by Anaël Beaugnon <ab-anssi>.

sklearn.svm

  • svm.SVC and svm.NuSVC now accept a break_ties parameter. This parameter results in predict breaking the ties according to the confidence values of decision_function, if decision_function_shape='ovr', and the number of target classes > 2. 12557 by Adrin Jalali.

sklearn.preprocessing

  • Avoid unnecessary data copy when fitting preprocessors preprocessing.StandardScaler, preprocessing.MinMaxScaler, preprocessing.MaxAbsScaler, preprocessing.RobustScaler and preprocessing.QuantileTransformer which results in a slight performance improvement. 13987 by Roman Yurchak.

sklearn.cluster

  • cluster.SpectralClustering now accepts a n_components parameter. This parameter extends SpectralClustering class functionality to match spectral_clustering. 13726 by Shuzhe Xiao <fdas3213>.

sklearn.feature_selection

  • Fixed a bug where VarianceThreshold with threshold=0 did not remove constant features due to numerical instability, by using range rather than variance in this case. 13704 by Roddy MacSween <rlms>.

sklearn.utils

  • utils.safe_indexing accepts an axis parameter to index array-like across rows and columns. The column indexing can be done on NumPy array, SciPy sparse matrix, and Pandas DataFrame. 14035 by Guillaume Lemaitre <glemaitre>.

sklearn.neural_network

  • Add max_fun parameter in neural_network.BaseMultilayerPerceptron, neural_network.MLPRegressor, and neural_network.MLPClassifier to give control over maximum number of function evaluation to not meet tol improvement. 9274 by Daniel Perry <daniel-perry>.

Miscellaneous

  • Replace manual checks with check_is_fitted. Errors thrown when using a non-fitted estimators are now more uniform. 13013 by Agamemnon Krasoulis <agamemnonc>.
  • Port lobpcg from SciPy which implement some bug fixes but only available in 1.3+. 14195 by Guillaume Lemaitre <glemaitre>.

Changes to estimator checks

These changes mostly affect library developers.

  • Estimators are now expected to raise a NotFittedError if predict or transform is called before fit; previously an AttributeError or ValueError was acceptable. 13013 by by Agamemnon Krasoulis <agamemnonc>.
  • Binary only classifiers are now supported in estimator checks. Such classifiers need to have the binary_only=True estimator tag. 13875 by Trevor Stephens.