Understanding early tree termination #456

mcberko · 2025-01-17T23:31:04Z

mcberko
Jan 17, 2025

Hi there,

I understand how pure variables are excluded from the pool of possible splitting variables. Could you clarify how this interacts with mtry when mtry < p? Specifically, is there some way in which this the number of pure vs non-pure variable could cause a tree to stop growing prematurely? For example, if I simulate binary predictors with p=4 (all important predictors) such that I have 16 unique factor-level combinations, set mtry=4 and set nodesize=1, I get 16 terminal nodes as expected. But when I set mtry=2 (or anything less than 4), I get <16 terminal nodes. I'm trying to understand what could cause this. Thank you!

ishwaran · 2025-01-18T04:04:01Z

ishwaran
Jan 18, 2025
Collaborator

Setting mtry=p turns off random feature selection and essentially you are growing a bootstrapped (bagged) tree. This is a less randomized tree.

When mtry<p this creates a more random tree and unusual things can happen.

To see how this is playing a role in depth of a tree with discrete features, I suggest you use the get.tree function and plot a few trees on your browser. Maybe this will give you some insight into what's going on.

get.tree help page

1 reply

mcberko Jan 18, 2025
Author

Thank you -- this is cool and indeed shows early termination, but doesn't show the reasons for early termination, unless I'm being obtuse. I was hoping to understand the "why". I can only think of two reasons:

The feature(s) selected to consider splits on has already been exhausted (it's impure). But I recall Udaya confirming that this does not occur -- that only pure features are selected for a potential split.
No improvement in error metric (squared error loss, logrank statistic, etc.). I suppose this could be the reason, but in my setting, I have 1000 observations, 4 binary features, and a continuous response, and so a maximum of 16 terminal nodes could occur (each with plenty of observations).

Would it be #2? Or might there be another reason for failing to split in the standard regression setting?

Thanks!

kogalur · 2025-01-20T13:17:41Z

kogalur
Jan 20, 2025
Maintainer

An x-var is removed from mtry selection (from this point onward in the splitting process) if it is first selected as an mtry candidate, and is then is detected as being pure. Consider a worst case scenario where mtry = 1. We select an x-var j, 1 <= j <= p. We split successfully on j at the root node. As a result of the split, an x-var k, k != j, becomes pure in the left daughter. The algorithm has no purity information on k at this point. In the next split attempt, on the left daughter, we randomly select k as our mtry candidate. The node terminates because we cannot split on k, and we have exhausted our mtry attempts. As you increase mtry = p, you will get deeper trees.

0 replies

ishwaran · 2025-01-20T14:48:42Z

ishwaran
Jan 20, 2025
Collaborator

If your goal is to improve MSE by encouraging a deeper tree, then consider the following which might help.

Make sure your 0/1 binary predictors are real-valued (no need for factors)
Form all pairwise interactions. So if you have 4 binary predictors this is 4 choose 2 = 6 interactions.
Add the interactions to your data frame. So now you have 4+6 =10 features.

Run the forests on the 10 features and see if that improves things.

0 replies

mcberko · 2025-01-26T00:24:40Z

mcberko
Jan 26, 2025
Author

Thank you both very much. I'm really just trying to understand what the algorithm is doing in a relatively simple data scenario, not trying to improve performance for now.

So, sticking with p=4 independent binary predictors, with a sufficiently high sample size (I'm using n=1200), I'm still trying to understand WHY trees will not be fully grown when mtry=1 and nodesize=1. In many cases, I get as few as 2 or 3 terminal nodes. Printing out the trees, as per Ishwaran's suggestion, indeed shows early termination (as does printing out the "leaf.count" values). When mtry=p, the # terminal nodes = unique factor-level combinations = 16 (4 x 4) in this case. But when mtry < p, # of terminal nodes < 16. The example that Kogalur gives about purity information not being available won't occur in my setting because the predictors are all independent and important (i.e., their effect is far from 0).

If a pure variable cannot be selected as a candidate to be split on (other than in the case Kogalur states, which doesn't apply to the setting I described above), then my understanding is that only non-pure variables will be selected, from which a split should then proceed. But apparently not. So then, what else could be leading the trees to not be fully grown? Are there other constraints that I'm failing to appreciate that could cause such early termination?

0 replies

kogalur · 2025-01-27T14:07:09Z

kogalur
Jan 27, 2025
Maintainer

Purity information is only available on those x-variables that have been the subject of mtry attempts. That purity information is known only at the parent node (the node that was split) for those x-variables. Go back and look at the worst case scenario above. The x-variable j was split at the root node. It was selected as our single mtry attempt. We know that j is impure, because we were able to split on it. We don't know that j is impure in one or both daughter nodes. It retains the parent state as a valid mtry x-variable. Impure x-variables can turn pure in a daughter after a split on a routine basis. We won't know that fact until we try to split on them further on down the tree. In the worse case scenario above, we pick k as the single mtry in the daughter. Then we test for purity, and we find that k is pure. So we terminate. You state that the scenario above does not apply to your case above. I think this is incorrect. Say that j is the colour of your eyes, and k is the colour of your hair. If we split people on the colour of their eyes at the root node, it might be that all people with brown hair end up going into the left daughter after the split on the root node. Next, we select hair color as our split in the left daughter. But we detect that it is now pure. So we terminate. We don't get another chance to split on any other x-variable, because mtry is one.

4 replies

ishwaran Jan 27, 2025
Collaborator

That's a great example. Thanks Udaya.

mcberko Jan 27, 2025
Author

Thank you again, but I still don't think this scenario applies to my settings since the binary predictors I simulated were independent. I just simplified things even more -- maybe this will be easier to figure out where I'm misunderstanding. I simulated 2 important (non-zero but not identical effects), independent, binary predictors (probability=0.5 for both levels), and used n=1200 normally distributed responses. The independence should mean that the scenario you describe doesn't happen, i.e., splitting on x1 leads to a daughter node where x2 is impure (but the algorithm doesn't know that because it hasn't been split on yet).

I then used your "quantreg" function with mtry=1 and nodesize=1, then had a look at the "leaf.count" values across the forest. I then printed out a tree that only had 2 terminal nodes. I see that that tree split on x2, then terminated.

I manually subsetted the data into x2=0 and x2=1; x1 is definitely not close to pure in either subset -- I have a roughly equal proportion of observations with x1=0 and x1=1 -- which means the tree should split on x1 in both daughter nodes, right? (I'm ignoring bootstrapping here, but I have plenty of data such that bootstrapping wouldn't lead to daughter nodes with a pure x1.) Is it possible the code does actually allow x2 to be selected again as a candidate even though it's pure?

kogalur Jan 27, 2025
Maintainer

I would make a simulated data set, with n <= 100. Set p = 2 or 4. Keep it small. Set nodedepth = 2 so we only split twice at the non-trivial depth. Set mtry = 1. Set bootstrap off. Then take a look at obj$forest$nativeArray. You'll be able to read off the splits and see what's happening.

mcberko Jan 27, 2025
Author

OK, I've tried that. For one of the trees with 2 terminal nodes, I get the following output:

I'm not sure if this helps explain WHY the tree terminates after just the one split though. This just seems to reaffirm that the tree does indeed terminate after splitting on x2.

ishwaran · 2025-01-27T19:41:20Z

ishwaran
Jan 27, 2025
Collaborator

Here's a simple example that shows how to get 2^4=16 terminal nodes with p=4 binary (0/1) predictors.

n <- 1e4
p <- 4
xvalue <- c(1,2)
x <- matrix(sample(xvalue, size=n*p, replace=TRUE), n)
y <- rnorm(n)

o <- rfsrc(y~.,data.frame(y=y,x),bootstrap="none",mtry=1,nodesize=1,nsplit=0)
print(o)
print(summary(factor(o$leaf.count)))


                         Sample size: 10000
                     Number of trees: 500
           Forest terminal node size: 1
       Average no. of terminal nodes: 5.868
No. of variables tried at each split: 1
              Total no. of variables: 4
       Resampling used to grow trees: none
    Resample size used to grow trees: 10000
                            Analysis: RF-R
                              Family: regr
                      Splitting rule: mse
  2   3   4   5   6   7   8   9  10  11  12  13 
 27  44  67 107  74  53  61  41  17   6   2   1 


o2 <- rfsrc(y~.,data.frame(y=y,x),bootstrap="none",mtry=4,nodesize=1,nsplit=0)
print(o2)
print(summary(factor(o2$leaf.count)))

                        Sample size: 10000
                     Number of trees: 500
           Forest terminal node size: 1
       Average no. of terminal nodes: 16
No. of variables tried at each split: 4
              Total no. of variables: 4
       Resampling used to grow trees: none
    Resample size used to grow trees: 10000
                            Analysis: RF-R
                              Family: regr
                      Splitting rule: mse
 16 
500

9 replies

kogalur Jan 27, 2025
Maintainer

No, in the above, we cannot say anything about x1 or x2. They remain permissible split variables. The only way we exclude a variable, is when we select it as an mtry, and we find that it is pure as we determine the unique split values. If it is pure, we exclude it from that point onward in the tree.

mcberko Jan 27, 2025
Author

Okay, so if a variable is selected as an mtry and is pure, it is excluded from that point onward; and then, I suppose, the tree won't switch to another variable to make a split, but instead it terminates?

I suppose this is confusing to me because of the p=2, mtry=1 setting. If we had p=4, with mtry=2, let's say we split on x1 at the parent node. Then, at the left daughter node, x2 becomes pure; and let's say x2 and x3 are selected as mtry candidates. That daughter will recognize x2 as pure, thus excluding it from that point onward in the tree, and the daughter node will split on x3. Do I have it right?

kogalur Feb 3, 2025
Maintainer

Yes, that is correct. Another example is split at the root node on X1, then branch left and split on X2. Then branch left and try to split on X1 again. If the tree terminates because of purity on X1, it will be because the tree is unaware that the first split on X1 caused the impurity.

mcberko Feb 3, 2025
Author

Great, thanks! My only follow-up question would be: what is the justification for writing the algorithm this way? Would it not make more sense to run something like "length(unique(x.var))==1" on every predictor, then exclude it as a candidate if TRUE, so as not to terminate the tree early? I get that there's a bias-variance trade-off going on, and that early termination increases tree diversity, which can help lower the variance part of the error. But nodesize already controls early termination directly, so why introduce another pathway that causes early termination? Or maybe there's some other justification besides tree diversity?

kogalur Feb 3, 2025
Maintainer

Consider a case where p is large, and you want to grow trees faster using mtry to limit the number of variables chosen at each node. The statement "length(unique(x.var)) == 1" involves sorting each x.var, and then determining uniqueness. This operation is linear in p, but it also depends on n. The cost of sorting and uniquifying, in a worst case scenario, is maybe around O(n^2 + n). So we are looking at O( p x (n^2 + n) at the root node. This is expensive. It gets less expensive, as we go along, but assuming a symmetric tree, and equally divided splits, you get a sequence depending on n, n/2, n/4, n/8. There will be n-1 internal nodes. Then multiply all that by ntree.

We leave it up to the user to set mtry = p if they actually want to test each variable. They can also set nsplit = 0 and choose deterministic splitting if they truly want the best statistic at each node. So we leave all those choices up to the user.

kogalur · 2025-01-27T20:19:57Z

kogalur
Jan 27, 2025
Maintainer

Use Dr. Ishwaran's example, but set n = 100 or less, p = 4, ntree = 1, mtry = 1, nodesize = 1, nsplit = 0, bootstrap = 1.

Use

m = data.frame(y=y, x)**

and for repeatability

set.seed(-1).
seed = -1

So

o = rfsrc(y~., m, bootstrap="none", ntree=1, mtry=1, nodedepth=2, nsplit=0, seed=-1)

Take a look at o$forest$nativeArray

and actually subset the data at the root split, and then at each subsequent split according to $parmID and $contPT. You'll find that you get pure nodes quite fast. This is a bit artificial because we only have binary variables. Here, in this example, if we split and send x1=1 to the left and x1=2 to the right, we know that x1 will be pure on the left and right. But in the general case, when real values are involved, we don't actually know the purity of x1 after the split. So if we try to split on x1 again, we will fail.

0 replies

Understanding early tree termination #456

Uh oh!

mcberko Jan 17, 2025

Replies: 7 comments · 14 replies

Uh oh!

ishwaran Jan 18, 2025 Collaborator

Uh oh!

Uh oh!

mcberko Jan 18, 2025 Author

Uh oh!

kogalur Jan 20, 2025 Maintainer

Uh oh!

ishwaran Jan 20, 2025 Collaborator

Uh oh!

Uh oh!

mcberko Jan 26, 2025 Author

Uh oh!

kogalur Jan 27, 2025 Maintainer

Uh oh!

ishwaran Jan 27, 2025 Collaborator

Uh oh!

mcberko Jan 27, 2025 Author

Uh oh!

kogalur Jan 27, 2025 Maintainer

Uh oh!

mcberko Jan 27, 2025 Author

Uh oh!

ishwaran Jan 27, 2025 Collaborator

Uh oh!

Uh oh!

kogalur Jan 27, 2025 Maintainer

Uh oh!

mcberko Jan 27, 2025 Author

Uh oh!

kogalur Feb 3, 2025 Maintainer

Uh oh!

mcberko Feb 3, 2025 Author

Uh oh!

kogalur Feb 3, 2025 Maintainer

Uh oh!

kogalur Jan 27, 2025 Maintainer

mcberko
Jan 17, 2025

Replies: 7 comments 14 replies

ishwaran
Jan 18, 2025
Collaborator

mcberko Jan 18, 2025
Author

kogalur
Jan 20, 2025
Maintainer

ishwaran
Jan 20, 2025
Collaborator

mcberko
Jan 26, 2025
Author

kogalur
Jan 27, 2025
Maintainer

ishwaran Jan 27, 2025
Collaborator

mcberko Jan 27, 2025
Author

kogalur Jan 27, 2025
Maintainer

mcberko Jan 27, 2025
Author

ishwaran
Jan 27, 2025
Collaborator

kogalur Jan 27, 2025
Maintainer

mcberko Jan 27, 2025
Author

kogalur Feb 3, 2025
Maintainer

mcberko Feb 3, 2025
Author

kogalur Feb 3, 2025
Maintainer

kogalur
Jan 27, 2025
Maintainer