Skip to content

Commit

Permalink
ENH Bring sparse input support to tree-based methods
Browse files Browse the repository at this point in the history
Author:     Arnaud Joly <arnaud.v.joly@gmail.com>
            Fares Hedayati <fares.hedayati@gmail.com>
  • Loading branch information
arjoly committed Nov 6, 2014
1 parent e01f225 commit c03c01a
Show file tree
Hide file tree
Showing 11 changed files with 18,442 additions and 11,538 deletions.
2 changes: 1 addition & 1 deletion doc/modules/ensemble.rst
Expand Up @@ -108,7 +108,7 @@ construction. The prediction of the ensemble is given as the averaged
prediction of the individual classifiers.

As other classifiers, forest classifiers have to be fitted with two
arrays: an array X of size ``[n_samples, n_features]`` holding the
arrays: a sparse or dense array X of size ``[n_samples, n_features]`` holding the
training samples, and an array Y of size ``[n_samples]`` holding the
target values (class labels) for the training samples::

Expand Down
18 changes: 12 additions & 6 deletions doc/modules/tree.rst
Expand Up @@ -90,10 +90,10 @@ Classification
:class:`DecisionTreeClassifier` is a class capable of performing multi-class
classification on a dataset.

As other classifiers, :class:`DecisionTreeClassifier` take as input two
arrays: an array X of size ``[n_samples, n_features]`` holding the training
samples, and an array Y of integer values, size ``[n_samples]``, holding
the class labels for the training samples::
As other classifiers, :class:`DecisionTreeClassifier` take as input two arrays:
an array X, sparse or dense, of size ``[n_samples, n_features]`` holding the
training samples, and an array Y of integer values, size ``[n_samples]``,
holding the class labels for the training samples::

>>> from sklearn import tree
>>> X = [[0, 0], [1, 1]]
Expand Down Expand Up @@ -157,7 +157,7 @@ a PDF file (or any other supported file type) directly in Python::

After being fitted, the model can then be used to predict new values::

>>> clf.predict(iris.data[0, :])
>>> clf.predict(iris.data[:1, :])
array([0])

.. figure:: ../auto_examples/tree/images/plot_iris_001.png
Expand Down Expand Up @@ -195,7 +195,6 @@ instead of integer values::
>>> clf.predict([[1, 1]])
array([ 0.5])


.. topic:: Examples:

* :ref:`example_tree_plot_tree_regression.py`
Expand Down Expand Up @@ -337,6 +336,13 @@ Tips on practical use
* All decision trees use ``np.float32`` arrays internally.
If training data is not in this format, a copy of the dataset will be made.

* If the input matrix X is very sparse, it is recommended to convert to sparse
``csc_matrix` before calling fit and sparse ``csr_matrix`` before calling
predict. Training time can be orders of magnitude faster for a sparse
matrix input compared to a dense matrix when features have zero values in
most of the samples.



.. _tree_algorithms:

Expand Down
413 changes: 207 additions & 206 deletions sklearn/ensemble/_gradient_boosting.c

Large diffs are not rendered by default.

0 comments on commit c03c01a

Please sign in to comment.