Output tree$nodes[[i]]$samples #258

predt · 2018-07-10T17:10:46Z

Hello @jtibshirani
Quick question: For a given final node "i" of a tree (i.e. a leaf),
does the output tree$nodes[[i]]$samples correspond to the observations of the training sub sample used to build the tree (i.e. J1 in paper) falling in that leaf, or are they the observations from the other sub sample (J2) falling in that leaf?
Thanks!

@predt I'm sorry I missed your question earlier! Would you be able to open a new issue with this question, and I will add a detailed answer there? Keeping each issue scoped to one topic helps ensure that other users with the same question will be able to find the answer as well. To answer briefly, that vector only contains examples from the second subsample (J2).

Thanks, @jtibshirani. Since tree$nodes[[i]]$samples corresponds to J2, the complement in "drawn_samples" should give me the set of samples in J1. Is that correct?
I'm working in the appendix of an application of the GRF. I'm using a tree example figure to make more pedagogical the explanation of building a tree. I wanted to add the theta.hat.P values that results after splitting of a node ( theta.hat.P is the notation in the paper) to illustrate how splits favor heterogeneity in the context of a generalized causal forest. That is the reason of looking for the J1 samples. Thanks.

jtibshirani · 2018-07-10T18:49:57Z

You're right, drawn_samples will include all samples that went into constructing the tree. If honesty is enabled, this set includes both the samples used to perform splits (J1), and the samples that populate the leaf nodes (J2). If honesty is not enabled, these two sets are the same, and drawn_samples will be equal to the union of all samples in the leaf nodes.

I've kept this issue open and tagged it with 'documentation', so we remember to add an explanation to get_tree about the different list elements that are returned.

susanathey · 2018-07-10T20:17:50Z

It would be better to keep track of which is which (J1 and J2) for the use case of using the results from a single tree; may matter for different methods of calculating standard errors as well.

jtibshirani · 2018-07-11T12:30:21Z

@susanathey to clarify the exchange above, because you have access to both the leaf samples of a tree, and the overall 'drawn samples' for that tree, both J1 and J2 can be calculated fairly easily. In particular, J2 can be calculated by taking the union of all samples in nodes[[i]]$samples, then J1 can be found by taking the difference of drawn_samples and J2.

My intuition is that unless accessing both J1 and J2 is part of a common (and performance-sensitive) workflow, we shouldn't return those sets separately to avoid duplicating the same set in J1 and J2 when honesty isn't enabled. Let me know if that seems off.

susanathey · 2018-07-11T16:32:16Z

@jtibshirani Sorry I misunderstood. Maybe we can post a code sample and/or add it to our testing or demo code for users who might want to access them.

jtibshirani · 2018-07-30T06:49:23Z

I've updated the documentation in #268.

jtibshirani added the documentation label Jul 10, 2018

jtibshirani mentioned this issue Jul 30, 2018

In get_tree, make sure that the leaf samples are 1-indexed. #266

Merged

jtibshirani closed this as completed Jul 30, 2018

jtibshirani mentioned this issue Sep 3, 2018

Address poor performance of honest forests on small datasets. #273

Closed

austindenteh mentioned this issue Apr 30, 2019

Obtain (conditional) outcome estimates and confidence intervals #403

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Output tree$nodes[[i]]$samples #258

Output tree$nodes[[i]]$samples #258

predt commented Jul 10, 2018

jtibshirani commented Jul 10, 2018

susanathey commented Jul 10, 2018

jtibshirani commented Jul 11, 2018

susanathey commented Jul 11, 2018

jtibshirani commented Jul 30, 2018

Output tree$nodes[[i]]$samples #258

Output tree$nodes[[i]]$samples #258

Comments

predt commented Jul 10, 2018

jtibshirani commented Jul 10, 2018

susanathey commented Jul 10, 2018

jtibshirani commented Jul 11, 2018

susanathey commented Jul 11, 2018

jtibshirani commented Jul 30, 2018