Skip to content

Latest commit

 

History

History
391 lines (314 loc) · 14.1 KB

v0.14.rst

File metadata and controls

391 lines (314 loc) · 14.1 KB

sklearn

Version 0.14

Version 0.14

August 7, 2013

Changelog

  • Missing values with sparse and dense matrices can be imputed with the transformer preprocessing.Imputer by Nicolas Trésegnie.
  • The core implementation of decisions trees has been rewritten from scratch, allowing for faster tree induction and lower memory consumption in all tree-based estimators. By Gilles Louppe.
  • Added ensemble.AdaBoostClassifier and ensemble.AdaBoostRegressor, by Noel Dawe and Gilles Louppe. See the AdaBoost <adaboost> section of the user guide for details and examples.
  • Added grid_search.RandomizedSearchCV and grid_search.ParameterSampler for randomized hyperparameter optimization. By Andreas Müller.
  • Added biclustering <biclustering> algorithms (sklearn.cluster.bicluster.SpectralCoclustering and sklearn.cluster.bicluster.SpectralBiclustering), data generation methods (sklearn.datasets.make_biclusters and sklearn.datasets.make_checkerboard), and scoring metrics (sklearn.metrics.consensus_score). By Kemal Eren.
  • Added Restricted Boltzmann Machines<rbm> (neural_network.BernoulliRBM). By Yann Dauphin.
  • Python 3 support by Justin Vincent <justinvf>, Lars Buitinck, Subhodeep Moitra <smoitra87> and Olivier Grisel. All tests now pass under Python 3.3.
  • Ability to pass one penalty (alpha value) per target in linear_model.Ridge, by @eickenberg and Mathieu Blondel.
  • Fixed sklearn.linear_model.stochastic_gradient.py L2 regularization issue (minor practical significance). By Norbert Crombach <norbert> and Mathieu Blondel .
  • Added an interactive version of Andreas Müller's Machine Learning Cheat Sheet (for scikit-learn) to the documentation. See Choosing the right estimator <ml_map>. By Jaques Grobler.
  • grid_search.GridSearchCV and cross_validation.cross_val_score now support the use of advanced scoring function such as area under the ROC curve and f-beta scores. See scoring_parameter for details. By Andreas Müller and Lars Buitinck. Passing a function from sklearn.metrics as score_func is deprecated.
  • Multi-label classification output is now supported by metrics.accuracy_score, metrics.zero_one_loss, metrics.f1_score, metrics.fbeta_score, metrics.classification_report, metrics.precision_score and metrics.recall_score by Arnaud Joly.
  • Two new metrics metrics.hamming_loss and metrics.jaccard_similarity_score are added with multi-label support by Arnaud Joly.
  • Speed and memory usage improvements in feature_extraction.text.CountVectorizer and feature_extraction.text.TfidfVectorizer, by Jochen Wersdörfer and Roman Sinayev.
  • The min_df parameter in feature_extraction.text.CountVectorizer and feature_extraction.text.TfidfVectorizer, which used to be 2, has been reset to 1 to avoid unpleasant surprises (empty vocabularies) for novice users who try it out on tiny document collections. A value of at least 2 is still recommended for practical use.
  • svm.LinearSVC, linear_model.SGDClassifier and linear_model.SGDRegressor now have a sparsify method that converts their coef_ into a sparse matrix, meaning stored models trained using these estimators can be made much more compact.
  • linear_model.SGDClassifier now produces multiclass probability estimates when trained under log loss or modified Huber loss.
  • Hyperlinks to documentation in example code on the website by Martin Luessi <mluessi>.
  • Fixed bug in preprocessing.MinMaxScaler causing incorrect scaling of the features for non-default feature_range settings. By Andreas Müller.
  • max_features in tree.DecisionTreeClassifier, tree.DecisionTreeRegressor and all derived ensemble estimators now supports percentage values. By Gilles Louppe.
  • Performance improvements in isotonic.IsotonicRegression by Nelle Varoquaux.
  • metrics.accuracy_score has an option normalize to return the fraction or the number of correctly classified sample by Arnaud Joly.
  • Added metrics.log_loss that computes log loss, aka cross-entropy loss. By Jochen Wersdörfer and Lars Buitinck.
  • A bug that caused ensemble.AdaBoostClassifier's to output incorrect probabilities has been fixed.
  • Feature selectors now share a mixin providing consistent transform, inverse_transform and get_support methods. By Joel Nothman.
  • A fitted grid_search.GridSearchCV or grid_search.RandomizedSearchCV can now generally be pickled. By Joel Nothman.
  • Refactored and vectorized implementation of metrics.roc_curve and metrics.precision_recall_curve. By Joel Nothman.
  • The new estimator sklearn.decomposition.TruncatedSVD performs dimensionality reduction using SVD on sparse matrices, and can be used for latent semantic analysis (LSA). By Lars Buitinck.
  • Added self-contained example of out-of-core learning on text data sphx_glr_auto_examples_applications_plot_out_of_core_classification.py. By Eustache Diemert <oddskool>.
  • The default number of components for sklearn.decomposition.RandomizedPCA is now correctly documented to be n_features. This was the default behavior, so programs using it will continue to work as they did.
  • sklearn.cluster.KMeans now fits several orders of magnitude faster on sparse data (the speedup depends on the sparsity). By Lars Buitinck.
  • Reduce memory footprint of FastICA by Denis Engemann and Alexandre Gramfort.
  • Verbose output in sklearn.ensemble.gradient_boosting now uses a column format and prints progress in decreasing frequency. It also shows the remaining time. By Peter Prettenhofer.
  • sklearn.ensemble.gradient_boosting provides out-of-bag improvement oob_improvement_ rather than the OOB score for model selection. An example that shows how to use OOB estimates to select the number of trees was added. By Peter Prettenhofer.
  • Most metrics now support string labels for multiclass classification by Arnaud Joly and Lars Buitinck.
  • New OrthogonalMatchingPursuitCV class by Alexandre Gramfort and Vlad Niculae.
  • Fixed a bug in `sklearn.covariance.GraphLassoCV`: the 'alphas' parameter now works as expected when given a list of values. By Philippe Gervais.
  • Fixed an important bug in sklearn.covariance.GraphLassoCV that prevented all folds provided by a CV object to be used (only the first 3 were used). When providing a CV object, execution time may thus increase significantly compared to the previous version (bug results are correct now). By Philippe Gervais.
  • cross_validation.cross_val_score and the grid_search module is now tested with multi-output data by Arnaud Joly.
  • datasets.make_multilabel_classification can now return the output in label indicator multilabel format by Arnaud Joly.
  • K-nearest neighbors, neighbors.KNeighborsRegressor and neighbors.RadiusNeighborsRegressor, and radius neighbors, neighbors.RadiusNeighborsRegressor and neighbors.RadiusNeighborsClassifier support multioutput data by Arnaud Joly.
  • Random state in LibSVM-based estimators (svm.SVC, svm.NuSVC, svm.OneClassSVM, svm.SVR, svm.NuSVR) can now be controlled. This is useful to ensure consistency in the probability estimates for the classifiers trained with probability=True. By Vlad Niculae.
  • Out-of-core learning support for discrete naive Bayes classifiers sklearn.naive_bayes.MultinomialNB and sklearn.naive_bayes.BernoulliNB by adding the partial_fit method by Olivier Grisel.
  • New website design and navigation by Gilles Louppe, Nelle Varoquaux, Vincent Michel and Andreas Müller.
  • Improved documentation on multi-class, multi-label and multi-output classification <multiclass> by Yannick Schwartz and Arnaud Joly.
  • Better input and error handling in the sklearn.metrics module by Arnaud Joly and Joel Nothman.
  • Speed optimization of the hmm module by Mikhail Korobov <kmike>
  • Significant speed improvements for sklearn.cluster.DBSCAN by cleverless

API changes summary

  • The auc_score was renamed metrics.roc_auc_score.
  • Testing scikit-learn with sklearn.test() is deprecated. Use nosetests sklearn from the command line.
  • Feature importances in tree.DecisionTreeClassifier, tree.DecisionTreeRegressor and all derived ensemble estimators are now computed on the fly when accessing the feature_importances_ attribute. Setting compute_importances=True is no longer required. By Gilles Louppe.
  • linear_model.lasso_path and linear_model.enet_path can return its results in the same format as that of linear_model.lars_path. This is done by setting the return_models parameter to False. By Jaques Grobler and Alexandre Gramfort
  • grid_search.IterGrid was renamed to grid_search.ParameterGrid.
  • Fixed bug in KFold causing imperfect class balance in some cases. By Alexandre Gramfort and Tadej Janež.
  • sklearn.neighbors.BallTree has been refactored, and a sklearn.neighbors.KDTree has been added which shares the same interface. The Ball Tree now works with a wide variety of distance metrics. Both classes have many new methods, including single-tree and dual-tree queries, breadth-first and depth-first searching, and more advanced queries such as kernel density estimation and 2-point correlation functions. By Jake Vanderplas
  • Support for scipy.spatial.cKDTree within neighbors queries has been removed, and the functionality replaced with the new sklearn.neighbors.KDTree class.
  • sklearn.neighbors.KernelDensity has been added, which performs efficient kernel density estimation with a variety of kernels.
  • sklearn.decomposition.KernelPCA now always returns output with n_components components, unless the new parameter remove_zero_eig is set to True. This new behavior is consistent with the way kernel PCA was always documented; previously, the removal of components with zero eigenvalues was tacitly performed on all data.
  • gcv_mode="auto" no longer tries to perform SVD on a densified sparse matrix in sklearn.linear_model.RidgeCV.
  • Sparse matrix support in sklearn.decomposition.RandomizedPCA is now deprecated in favor of the new TruncatedSVD.
  • cross_validation.KFold and cross_validation.StratifiedKFold now enforce n_folds >= 2 otherwise a ValueError is raised. By Olivier Grisel.
  • datasets.load_files's charset and charset_errors parameters were renamed encoding and decode_errors.
  • Attribute oob_score_ in sklearn.ensemble.GradientBoostingRegressor and sklearn.ensemble.GradientBoostingClassifier is deprecated and has been replaced by oob_improvement_ .
  • Attributes in OrthogonalMatchingPursuit have been deprecated (copy_X, Gram, ...) and precompute_gram renamed precompute for consistency. See #2224.
  • sklearn.preprocessing.StandardScaler now converts integer input to float, and raises a warning. Previously it rounded for dense integer input.
  • sklearn.multiclass.OneVsRestClassifier now has a decision_function method. This will return the distance of each sample from the decision boundary for each class, as long as the underlying estimators implement the decision_function method. By Kyle Kastner.
  • Better input validation, warning on unexpected shapes for y.

People

List of contributors for release 0.14 by number of commits.

  • 277 Gilles Louppe
  • 245 Lars Buitinck
  • 187 Andreas Mueller
  • 124 Arnaud Joly
  • 112 Jaques Grobler
  • 109 Gael Varoquaux
  • 107 Olivier Grisel
  • 102 Noel Dawe
  • 99 Kemal Eren
  • 79 Joel Nothman
  • 75 Jake VanderPlas
  • 73 Nelle Varoquaux
  • 71 Vlad Niculae
  • 65 Peter Prettenhofer
  • 64 Alexandre Gramfort
  • 54 Mathieu Blondel
  • 38 Nicolas Trésegnie
  • 35 eustache
  • 27 Denis Engemann
  • 25 Yann N. Dauphin
  • 19 Justin Vincent
  • 17 Robert Layton
  • 15 Doug Coleman
  • 14 Michael Eickenberg
  • 13 Robert Marchman
  • 11 Fabian Pedregosa
  • 11 Philippe Gervais
  • 10 Jim Holmström
  • 10 Tadej Janež
  • 10 syhw
  • 9 Mikhail Korobov
  • 9 Steven De Gryze
  • 8 sergeyf
  • 7 Ben Root
  • 7 Hrishikesh Huilgolkar
  • 6 Kyle Kastner
  • 6 Martin Luessi
  • 6 Rob Speer
  • 5 Federico Vaggi
  • 5 Raul Garreta
  • 5 Rob Zinkov
  • 4 Ken Geis
  • 3 A. Flaxman
  • 3 Denton Cockburn
  • 3 Dougal Sutherland
  • 3 Ian Ozsvald
  • 3 Johannes Schönberger
  • 3 Robert McGibbon
  • 3 Roman Sinayev
  • 3 Szabo Roland
  • 2 Diego Molla
  • 2 Imran Haque
  • 2 Jochen Wersdörfer
  • 2 Sergey Karayev
  • 2 Yannick Schwartz
  • 2 jamestwebber
  • 1 Abhijeet Kolhe
  • 1 Alexander Fabisch
  • 1 Bastiaan van den Berg
  • 1 Benjamin Peterson
  • 1 Daniel Velkov
  • 1 Fazlul Shahriar
  • 1 Felix Brockherde
  • 1 Félix-Antoine Fortin
  • 1 Harikrishnan S
  • 1 Jack Hale
  • 1 JakeMick
  • 1 James McDermott
  • 1 John Benediktsson
  • 1 John Zwinck
  • 1 Joshua Vredevoogd
  • 1 Justin Pati
  • 1 Kevin Hughes
  • 1 Kyle Kelley
  • 1 Matthias Ekman
  • 1 Miroslav Shubernetskiy
  • 1 Naoki Orii
  • 1 Norbert Crombach
  • 1 Rafael Cunha de Almeida
  • 1 Rolando Espinoza La fuente
  • 1 Seamus Abshere
  • 1 Sergey Feldman
  • 1 Sergio Medina
  • 1 Stefano Lattarini
  • 1 Steve Koch
  • 1 Sturla Molden
  • 1 Thomas Jarosch
  • 1 Yaroslav Halchenko