Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT Split Tree into a BaseTree and a Tree subclass to allow easier inheritance #25119

Open
adam2392 opened this issue Dec 6, 2022 · 0 comments · May be fixed by #25118 or #25448
Open

MAINT Split Tree into a BaseTree and a Tree subclass to allow easier inheritance #25119

adam2392 opened this issue Dec 6, 2022 · 0 comments · May be fixed by #25118 or #25448
Labels

Comments

@adam2392
Copy link
Member

adam2392 commented Dec 6, 2022

Summary

#24678 introduces a modularization of Criterion to allow different criterion to be used with the same classes.
#25101 introduces a modularization of Splitter to allow different types of of splits to be computed.

Now comes the time to also modularize the Tree class. A good Tree class should enable oblique splits, causal leaf nodes (i.e. leaf nodes set differently from split nodes), quantile trees (leaf nodes set differently from split nodes) and unsupervised trees. Note another feature of causal trees is 'honesty', which should be easier to add after this issue is resolved.

Proposed improvement

We will have the following improvements:

  1. Refactor tree._add_node() to set the split node and leaf node differently.
  2. Refactor to have a 'splitptr' for SplitRecord, which allows for generalizations of the SplitRecord.
  3. Separate Tree into generic and abstract base functions for BaseTree and specific supervised axis-aligned functions for Tree

Once the changes are made, one should verify:

  1. If tree submodule's Cython code still builds (i.e. make clean and then pip install --verbose --no-build-isolation --editable . should not error out)
  2. verify unit tests inside sklearn/tree all pass
  3. verify that the asv benchmarks do not show a performance regression.

asv continuous --verbose --split --bench RandomForest upstream/main <new_branch_name> and then for side-by-side comparison asv compare main <new_branch_name>

Reference

As discussed in #24577 , I wrote up a doc on proposed improvements to the tree submodule that would:

  1. make it easier for 3rd party packages to subclass existing sklearn tree code and
  2. make it easier for sklearn itself to make improvements to the tree code with many of the modern improvements to trees

cc: @jjerphan

@github-actions github-actions bot added the Needs Triage Issue requires triage label Dec 6, 2022
@thomasjpfan thomasjpfan added module:tree Refactor Code refactor and removed Needs Triage Issue requires triage labels Dec 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants