Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT Refactor Tree Cython class to support modularity #25448

Open
wants to merge 71 commits into
base: main
Choose a base branch
from

Conversation

adam2392
Copy link
Contributor

@adam2392 adam2392 commented Jan 20, 2023

Reference Issues/PRs

Fixes: #25119
Closes: #24746
Closes: #24000

Supersedes: #25118

Requires #24678 to be merged first, since this is a fork of that branch.

This ends up being relatively large, and the below changes can be broken up probably into 2 PRs. One for splitting Tree -> BaseTree and Tree.

What does this implement/fix? Explain your changes.

  1. Splits Tree class into a BaseTree and Tree class: The BaseTree does not assume any specifics on how nodes are split, how leaf nodes are set. This paves the path for enabling new trees such as: i) oblique trees, ii) causal trees and iii) quantile trees.
  • This would enable someone to easily subclass the Tree API without requiring specific data structures
  1. Adds new functions _set_split_node(), _set_leaf_node(), _compute_feature(), _compute_feature_importances() to allow subclasses of BaseTree to define any decision tree that generalizes in any one of those directions.
  • This would allow us to easily implement quantile-trees, honest-trees

Any other comments?

Cross-referencing:

Reference for quantile-tree pseudocode demo: https://zillow.github.io/quantile-forest/auto_examples/plot_quantile_vs_standard_forest.html#sphx-glr-auto-examples-plot-quantile-vs-standard-forest-py

Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
@adam2392
Copy link
Contributor Author

@jjerphan as a quick proof of concept, I'm pushing up a "pseudocode-demo" example of how the refactored Tree API would enable a quantile tree. See: sklearn/tree/_quantile_tree_demo.pyx.

My impression is that there is interest in this also from the core devs. What next steps are necessary, or should I wait until #24678 and #24990 is merged first? As a reminder, these 3 PRs would enable i) general modularity of the Tree codebase, ii) unsupervised trees, oblique trees and other splitting functions and iii) allow others to build quantile/honest trees that add more data to leaf nodes.

Tagging @Vincent-Maladiere who said he was interested in helping out. If you have time, can you give this a look to see if the new Tree API would make quantile-trees work?

@jjerphan
Copy link
Member

Hi,

What next steps are necessary, or should I wait until #24678 and #24990 is merged first?

The blocker is maintainers' and reviewers' bandwidth mainly, I am afraid.

Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants