Option to return full decision paths when predicting with decision trees or random forest #2937

Closed
wants to merge 16 commits into
from

Conversation

Projects
None yet
7 participants

andosa commented Mar 5, 2014

A very useful feature for decision trees is the option to access the full decision path for a prediction, i.e. the path from root to leaf for each decision.
This enables to interpret the model in the context of data in a very useful way. In particular, it allows to see exactly why the tree or forest arrived at a particular result, breaking down the prediction into exact components (feature contributions). This is much needed in certain application areas, for example in credit card fraud detection it is important to get an understanding why the model labels a transaction as fraudulent.

The pull request implements a predict option for random forest and decision tree: when return_paths keyword argument is set to True, paths are returned instead of predictions. I have not added docstrings yet, I assume the API might be expected to be different (another method instead of keyword argument to predict?)

In addition, there is a change to store values at each node, not just the leaves (useful when interpreting the tree).

Owner

GaelVaroquaux commented Mar 5, 2014

I assume the API might be expected to be different (another method
instead of keyword argument to predict?)

If we decide to go in this direction and add such a functionality (I must
say that I haven't understood the compeling argument to add it), I think
that it should be in the form of a new method, rather than overloading
the predict method.

Owner

glouppe commented Mar 8, 2014

Thanks for your contribution @andosa !

While I think returning full paths is an interesting feature, I would personnally be against adding your example compute_feature_contributions.py. What you call "feature contributions" is obviously something non-standard and seems to only work in a few situations (currently, your code only work for regression or binary classification, for a single output). I would rather let users implement such a functionality, assuming they know what they do, rather than maintaining this into master.

Also, as @GaelVaroquaux suggested, I wouldn't overload the predict method but would rather create a dedicated new method for computing prediction paths.

Owner

jnothman commented Mar 8, 2014

Part of the point is that scikit-learn's capability to explain its decisions currently requires a fairly deep understanding of the model internals. If the library will not provide an API for explaining its predictions (model introspection in general, as well as for particular input), then examples are probably a good idea, though I take your point that the analysis is non-standard in this case.

Owner

glouppe commented Mar 9, 2014

Part of the point is that scikit-learn's capability to explain its decisions currently requires a fairly deep understanding of the model internals. If the library will not provide an API for explaining its predictions (model introspection in general, as well as for particular input), then examples are probably a good idea, though I take your point that the analysis is non-standard in this case.

The exact contributions of features in a forest is computed with feature importances, as we currently have them. Computing the contribution of feature as did in the example of this PR, as if the output value in a tree was an additive function of the input features, really is something non-standard. (I don't say that it may not be useful though.)

@andosa andosa * Created new methods for decision path computation instead of overlo…
…ading

the predict method
* Fixed a bug of decision path array being typed int32 instead of SIZE_t
068f05b

Coverage Status

Coverage remained the same when pulling 068f05b on andosa:tree_paths into 4437d56 on scikit-learn:master.

andosa commented Mar 11, 2014

I've lifted the code into a separate function instead of overloading predict method.

What you call "feature contributions" is obviously something non-standard and seems to only work in a few situations (currently, your code only work for regression or binary classification, for a single output)

Not sure what you mean by "few situations". If you mean single output only, it will work exactly the same way for multi-output trees, where you would simply have a list of contributions (instead of just one) from each feature: one contribution per output.
I could add a utility function that takes the sample frame as input and returns a table shaped (n_samples* (1 + n_features) * n_outputs), each row representing the contribution from each feature (+ trainset mean), summing up to the prediction you would get from the predict method. This would make the example less "ad hoc" and more reusable perhaps.

I do agree that the additive representation of tree predictions is somewhat non-standard, in the sense that its typically not something in textbooks. At the same time i feel it is something more people should know about, since it is very useful in understanding the model behavior on particular data (vs the static view you get from feature importances)

Ando Saabas added some commits Aug 25, 2014

Ando Saabas Merge remote-tracking branch 'upstream/master'
Conflicts:
	sklearn/ensemble/forest.py
	sklearn/tree/_tree.c
	sklearn/tree/_tree.pyx
	sklearn/tree/tests/test_tree.py
2881b60
Ando Saabas Fixed array2d->check_arrays 04098ff
Ando Saabas Fixed exsample path depth check bug 35ad7f5
Ando Saabas Adding cython generated _tree.c 7c0e173

Coverage Status

Coverage decreased (-0.02%) when pulling 7c0e173 on andosa:tree_paths into 4b82379 on scikit-learn:master.

Owner

jnothman commented Aug 26, 2014

I do agree that the additive representation of tree predictions is somewhat non-standard, in the sense that its typically not something in textbooks. At the same time i feel it is something more people should know about, since it is very useful in understanding the model behavior on particular data (vs the static view you get from feature importances)

You say typically. Can you find a reference to this or another technique for quantitative decision path inspection?

andosa commented Aug 27, 2014

On the additive representation of predictions, see for example http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=6642461

But even regardless of this particular technique, the ability to inspect the decision
paths (along with the estimates in the nodes in addition to leaves) is something that is very useful to have in the data scientist's toolbox.

Owner

arjoly commented Aug 28, 2014

Hi,

I find this feature highly interesting. However, I have the feeling that it would more appropriate as a public function that take a fitted/unfitted estimator and call the "private" method of the Tree structure. Instead of adding a new method to the api.

I haven't look in detail at the current example. But I have the feeling that this should be discussed or made in another pull request.

As an example, I would have expected something simpler: a fitted decision tree with the path of one samples in this tree (maybe on iris?).

Owner

jnothman commented Aug 28, 2014

it would more appropriate as a public function that take a fitted/unfitted estimator and call the "private" method of the Tree structure. Instead of adding something new method to the api.

I'm not sure what you mean. Could you give a usage example, @arjoly?

Owner

arjoly commented Aug 28, 2014

Sorry for the unclear reasonning.
I will try to clarify.

For the functionality in this pr, I would use it like that

est = DecisionTreeClassifier(...).fit(X, y)
for s in interesting_samples:
    path = compute_decision_path(est, X[s])
    # Plot decision tree and path

where compute_decision_path would be the new features added by this pr. The compute_decision_path would indeed called internally est.tree_.decision_path(X).

Thinking about it an alternative option could be to improve the apply method.

Owner

jnothman commented Aug 28, 2014

Keep apply() lightweight, IMO.

On 28 August 2014 21:47, Arnaud Joly notifications@github.com wrote:

Sorry for the unclear reasonning.
I will try to clarify.

For the functionality in this pr, I would use it like that

est = DecisionTreeClassifier(...).fit(X, y)
for s in interesting_samples:
path = compute_decision_path(est, X[s])
# Plot decision tree and paths

where compute_decision_path would be the new features added by this pr.
The compute_decision_path would indeed called internally
est.tree_.decision_path(X).

Thinking about it a better option could be to improve the apply method
https://github.com/andosa/scikit-learn/blob/tree_paths/sklearn/ensemble/forest.py#L142
.


Reply to this email directly or view it on GitHub
#2937 (comment)
.

andosa commented Aug 28, 2014

Which module would compute_decision_path live in? Something like tree/tree_utils.py?

Owner

jnothman commented Aug 28, 2014

fwiw, I think I'd prefer to see the method as it is here. Rather, I've not understood the benefit of @arjoly's proposal.

On 28 August 2014 22:35, andosa notifications@github.com wrote:

Which module would compute_decision_path live in? Something like
tree/tree_utils.py?


Reply to this email directly or view it on GitHub
#2937 (comment)
.

Coverage Status

Coverage decreased (-0.02%) when pulling 6fe3bb7 on andosa:tree_paths into 4b82379 on scikit-learn:master.

andosa commented Sep 3, 2014

I removed the feature contribution example. Agreed that it's probably a bit too involved and obscure to be in the examples list.

The pull request now includes nothing but two conceptual changes that both help with model and prediction inspection:

  • Values for trained trees are kept for each node, not just for leaves
  • Decision tree and random forest have a method to return paths.

As such, i think the pull request is very straightforward and non-controversial. Anything else I could do to speed up merging?

andosa added some commits Oct 15, 2014

@andosa andosa Merge upstream 1a93cf7
@andosa andosa Merge branch 'tree_paths' of https://github.com/andosa/scikit-learn i…
…nto tree_paths
8f3eef6
@andosa andosa Merge remote-tracking branch 'upstream/master' into tree_paths
Conflicts:
	appveyor.yml
	doc/modules/cross_validation.rst
	doc/modules/kernel_approximation.rst
	doc/modules/model_evaluation.rst
	doc/modules/sgd.rst
	doc/modules/unsupervised_reduction.rst
	doc/whats_new.rst
	sklearn/__init__.py
	sklearn/cluster/tests/test_bicluster.py
	sklearn/cluster/tests/test_hierarchical.py
	sklearn/cluster/tests/test_spectral.py
	sklearn/datasets/base.py
	sklearn/datasets/tests/test_svmlight_format.py
	sklearn/decomposition/base.py
	sklearn/decomposition/incremental_pca.py
	sklearn/decomposition/tests/test_incremental_pca.py
	sklearn/ensemble/bagging.py
	sklearn/ensemble/gradient_boosting.py
	sklearn/ensemble/tests/test_gradient_boosting.py
	sklearn/ensemble/tests/test_gradient_boosting_loss_functions.py
	sklearn/ensemble/weight_boosting.py
	sklearn/feature_extraction/dict_vectorizer.py
	sklearn/feature_extraction/text.py
	sklearn/feature_selection/tests/test_feature_select.py
	sklearn/feature_selection/variance_threshold.py
	sklearn/gaussian_process/tests/test_gaussian_process.py
	sklearn/grid_search.py
	sklearn/linear_model/base.py
	sklearn/linear_model/cd_fast.c
	sklearn/linear_model/cd_fast.pyx
	sklearn/linear_model/coordinate_descent.py
	sklearn/linear_model/logistic.py
	sklearn/linear_model/ridge.py
	sklearn/linear_model/sgd_fast.c
	sklearn/linear_model/sgd_fast.pyx
	sklearn/linear_model/stochastic_gradient.py
	sklearn/linear_model/tests/test_coordinate_descent.py
	sklearn/linear_model/tests/test_least_angle.py
	sklearn/linear_model/tests/test_logistic.py
	sklearn/linear_model/tests/test_sgd.py
	sklearn/metrics/regression.py
	sklearn/metrics/tests/test_score_objects.py
	sklearn/neighbors/base.py
	sklearn/neighbors/kde.py
	sklearn/neighbors/nearest_centroid.py
	sklearn/neighbors/tests/test_nearest_centroid.py
	sklearn/neighbors/tests/test_neighbors.py
	sklearn/preprocessing/data.py
	sklearn/preprocessing/tests/test_data.py
	sklearn/svm/base.py
	sklearn/svm/bounds.py
	sklearn/svm/classes.py
	sklearn/svm/tests/test_svm.py
	sklearn/tests/test_common.py
	sklearn/tests/test_grid_search.py
	sklearn/tests/test_isotonic.py
	sklearn/tests/test_pipeline.py
	sklearn/tree/_tree.c
	sklearn/tree/_tree.pxd
	sklearn/tree/_tree.pyx
	sklearn/tree/tests/test_tree.py
	sklearn/tree/tree.py
	sklearn/utils/__init__.py
	sklearn/utils/extmath.py
	sklearn/utils/sparsefuncs.py
	sklearn/utils/testing.py
	sklearn/utils/tests/test_extmath.py
	sklearn/utils/tests/test_utils.py
	sklearn/utils/tests/test_validation.py
	sklearn/utils/validation.py
649c607
@andosa andosa Fix upstream merge 6b6d889

andosa commented Apr 23, 2015

Could we revisit this PR? Tree code seems to have stabilized now, and this is simple but a very useful feature for a large number of use cases (that could be moved to a private function)

Owner

amueller commented Apr 27, 2015

@jnothman I think the idea behind @arjoly's proposal is to not clutter the interface of the forest too much with very specialized methods. I can see where he is coming from, but I feel we rarely take that road, and it makes a feature a lot less discoverable. I am undecided on the issue.

@amueller amueller and 1 other commented on an outdated diff Apr 27, 2015

sklearn/tree/tests/test_tree.py
+def test_tree_paths_toy():
+ """Prediction by following tree paths and summing up feature contributoins"""
+ for name, Tree in REG_TREES.items():
+ clf = Tree(random_state=0)
+ clf.fit(X, y)
+ straight_predictions = clf.predict(T)
+ paths = clf.decision_paths(T)
+ path_predictions = []
+ for path in paths:
+ base = 0
+ pred = 0
+ for i in range(len(path)):
+ if path[i] == -1:
+ break
+ node_id = path[i]
+ pred += clf.tree_.value[node_id][0][0] - base
@amueller

amueller Apr 27, 2015

Owner

two white spaces after "=". Can you run pep8 on the file?

@andosa

andosa Apr 28, 2015

Commited pep8 fixes

Owner

amueller commented Apr 27, 2015

@arjoly arjoly commented on the diff Apr 30, 2015

sklearn/ensemble/forest.py
+ Parameters
+ ----------
+ X : array-like of shape = [n_samples, n_features]
+ The input samples.
+
+ Returns
+ -------
+ y : list of arrays with shape = [n_estimators, n_samples, max_depth + 1]
+ Decision paths for each each tree and for eachprediction.
+ Each path is an array of node ids, starting with the root node id.
+ If a path is shorter than max_depth + 1, it is padded with -1 on the right.
+ """
+
+ # Check data
+ if getattr(X, "dtype", None) != DTYPE or X.ndim != 2:
+ X = array2d(X, dtype=DTYPE)
@arjoly

arjoly Apr 30, 2015

Owner

Can you use the _validate_X_predict?

@arjoly arjoly commented on the diff Apr 30, 2015

sklearn/ensemble/forest.py
+ The input samples.
+
+ Returns
+ -------
+ y : list of arrays with shape = [n_estimators, n_samples, max_depth + 1]
+ Decision paths for each each tree and for eachprediction.
+ Each path is an array of node ids, starting with the root node id.
+ If a path is shorter than max_depth + 1, it is padded with -1 on the right.
+ """
+
+ # Check data
+ if getattr(X, "dtype", None) != DTYPE or X.ndim != 2:
+ X = array2d(X, dtype=DTYPE)
+
+ # Assign chunk of trees to jobs
+ n_jobs, n_trees, starts = _partition_estimators(self)
@arjoly

arjoly Apr 30, 2015

Owner

only n_jobs is needed

n_jobs = _partition_estimators(self)[0]

@arjoly arjoly commented on the diff Apr 30, 2015

sklearn/tree/_tree.pyx
@@ -3212,6 +3211,36 @@ cdef class Tree:
out = out.reshape(X.shape[0], self.max_n_classes)
return out
+ cpdef np.ndarray decision_paths(self, np.ndarray[DTYPE_t, ndim=2] X):
@arjoly

arjoly Apr 30, 2015

Owner

Can you add support for sparse csr X?

@arjoly arjoly commented on the diff Apr 30, 2015

sklearn/tree/tree.py
+
+
+ Parameters
+ ----------
+ X : array-like of shape = [n_samples, n_features]
+ The input samples.
+
+ Returns
+ -------
+ y : array of shape = [n_samples, max_depth + 1]
+ Decision paths, where each path is an array of node ids, starting
+ with the root node id. If a path is shorter than max_depth + 1,
+ it is padded with -1 on the right.
+ """
+ if getattr(X, "dtype", None) != DTYPE or X.ndim != 2:
+ X = check_array(X, dtype=DTYPE)
@arjoly

arjoly Apr 30, 2015

Owner

Can you use _validate_X_predict?

@arjoly arjoly commented on the diff Apr 30, 2015

sklearn/tree/tree.py
+ with the root node id. If a path is shorter than max_depth + 1,
+ it is padded with -1 on the right.
+ """
+ if getattr(X, "dtype", None) != DTYPE or X.ndim != 2:
+ X = check_array(X, dtype=DTYPE)
+
+ n_samples, n_features = X.shape
+
+ if self.tree_ is None:
+ raise Exception("Tree not initialized. Perform a fit first")
+
+ if self.n_features_ != n_features:
+ raise ValueError("Number of features of the model must "
+ " match the input. Model n_features is %s and "
+ " input n_features is %s "
+ % (self.n_features_, n_features))
@arjoly

arjoly Apr 30, 2015

Owner

Those checks are done in _validate_X_predict.

@arjoly arjoly commented on the diff Apr 30, 2015

sklearn/tree/_tree.pyx
@@ -2707,12 +2707,11 @@ cdef class DepthFirstTreeBuilder(TreeBuilder):
split.threshold, impurity, n_node_samples,
weighted_n_node_samples)
- if is_leaf:
- # Don't store value for internal nodes
- splitter.node_value(tree.value +
- node_id * tree.value_stride)
+ # Store value for all nodes, to facilitate tree/model inspection and interpretation
+ splitter.node_value(tree.value
+ + node_id * tree.value_stride)
@arjoly

arjoly Apr 30, 2015

Owner

This should be done in a separate pr with appropriate benchmarks as discussed in #4644.

@arjoly

arjoly May 12, 2015

Owner

This has been done in a different pr.

Owner

arjoly commented Apr 30, 2015

@jnothman I think the idea behind @arjoly's proposal is to not clutter the interface of the forest too much with very specialized methods. I can see where he is coming from, but I feel we rarely take that road, and it makes a feature a lot less discoverable. I am undecided on the issue.

Other opinion on this? (@glouppe ?)

@arjoly arjoly commented on the diff Jun 8, 2015

sklearn/ensemble/forest.py
@@ -98,7 +98,12 @@ def _parallel_build_trees(tree, forest, X, y, sample_weight, tree_idx, n_trees,
else:
tree.fit(X, y, sample_weight=sample_weight, check_input=False)
- return tree
+ return tree
+
+
+#def _parallel_predict_paths(trees, X):
+# """Private function used to compute a batch of prediction paths within a job."""
+# return [tree.decision_paths(X) for tree in trees]
@arjoly

arjoly Jun 8, 2015

Owner

Could you remove this?

@arjoly arjoly commented on the diff Jun 8, 2015

sklearn/ensemble/forest.py
+ If a path is shorter than max_depth + 1, it is padded with -1 on the right.
+ """
+
+ # Check data
+ if getattr(X, "dtype", None) != DTYPE or X.ndim != 2:
+ X = array2d(X, dtype=DTYPE)
+
+ # Assign chunk of trees to jobs
+ n_jobs, n_trees, starts = _partition_estimators(self)
+
+ # Parallel loop
+ path_list = Parallel(n_jobs=n_jobs, verbose=self.verbose,
+ backend="threading")(
+ #delayed(_parallel_predict_paths)(
+ # self.estimators_[starts[i]:starts[i + 1]], X)
+ #for i in range(n_jobs))
@arjoly

arjoly Jun 8, 2015

Owner

Could you remove this?

Owner

arjoly commented Jun 12, 2015

@andosa if you need some help for the sparse algorithm, I may find some time.

andosa commented Aug 12, 2015

Since 0.17 now has both value recording in the intermediate nodes, and the option to return node_ids for predictions via apply, the paths can be extracted via those. So explicit method for that isn't actually needed anymore. I think this one should be closed?

I wrote an example on how to extract the paths with current master to decompose every prediction into bias + feature contribution components (such that for each prediction, prediction = bias + feature_1_contribution + ... + feature_n_contribution).
Writeup at http://blog.datadive.net/random-forest-interpretation-with-scikit-learn/.
Code at https://github.com/andosa/treeinterpreter.

There have been a lot of people that do practical data science that have found this very useful (and i don't know of any other ML library that exposes this). Do you think it might be worth including into scikit-learns examples or even tree utils (like export_graphviz).

Owner

arjoly commented Aug 13, 2015

Since the apply method returns the leaves id and not the node id, I still think that there is some place for the decision node path.

andosa commented Aug 13, 2015

But isn't the leave id the node_id of the leaf? And since you have exactly one path to each leaf, you have a mapping from leave ids' to paths.

Owner

arjoly commented Aug 13, 2015

Yes, however computing the path in python is likely to be slow. There are feature extraction methods based on the extraction of node indicator features for boosting and random forest based algorithm.

andosa commented Aug 13, 2015

Computing all the paths is a one off procedure, just one DFS, after which you have a lookup table that can map leaf id -> path.

Owner

amueller commented Aug 13, 2015

I agree that we should try to avoid too much cython.
I think an example would be nice.

Owner

glouppe commented Oct 19, 2015

Closing this.

I'll create an open issue to have an example showing how to extract prediction paths.

glouppe closed this Oct 19, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment