Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EXAMPLE DIFF] (Tree featuresv2) Fork of sklearn that maintains all necessary refactorings to enable downstream functionality #32

Merged
merged 180 commits into from
Mar 29, 2023

Conversation

adam2392
Copy link
Collaborator

@adam2392 adam2392 commented Jan 30, 2023

Reference Issues/PRs

This is the most up-to-date PR branch to consolidate all proposed refactor changes that work with:

  • unsupervised trees
  • oblique trees
  • no performance/runtime regressions against main

What does this implement/fix? Explain your changes.

Incorporates refactors to:

Internal Cython of scikit-learn's:

  • criterion
  • splitter
  • tree

Internals of Python in scikit-learns:

  • python Tree

Adds the basic implementation of oblique trees. The implementation of oblique trees has been tested on all sklearn's check_estimator testing function and has error-checking bounds for the new hyperparameter introduced, which is feature_combinations that defaults to min(1.5, n_features).

TODO:

  1. Add honest support for trees (splitting the data at the Python API level)
  2. Build wheels
  3. Brainstorm unit-tests, or weekly checks to determine when our fork is out-of-date compared to upstream sklearn
  4. Revamp README for the fork

Any other comments?

Eventually, we want to build wheels in order to make this fork "maintainable".

adam2392 and others added 30 commits March 7, 2022 17:28
Co-authored-by: Chester Huynh <chester.huynh924@gmail.com>
Co-authored-by: Parth Vora <pvora4@jhu.edu>
Co-Authored-By: Thomas Fan <thomasjpfan@gmail.com>
Co-Authored-By: Chester Huynh <chester.huynh924@gmail.com>
Co-Authored-By: Parth Vora <pvora4@jhu.edu>
tests.

The issues still are:

- sparse dataset support, which I could possibly split into a follow-up PR?
- pickling does not work on roundtrip for some reason
- certain issues with max_leaf
@adam2392 adam2392 changed the base branch from main to fork March 26, 2023 14:54
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
#### Reference Issues/PRs
Closes: #25 
Closes: #23 


#### What does this implement/fix? Explain your changes.
Adds preliminary capability for binning features. The cons is we need to
"bin" again during predict time.

I've documented how we can get around this in the README, but it will involve some
heavy-duty Cython coding.

- Remove Oblique tree models and migrated to scikit-tree
- Fix up CI so that way unnecessary workflows are not ran on the fork
- Updated the documentation with the current limitation of binning

---------

Signed-off-by: Adam Li <adam2392@gmail.com>
@adam2392 adam2392 merged commit 0ac8a71 into fork Mar 29, 2023
16 of 18 checks passed
@adam2392 adam2392 deleted the tree-featuresv2 branch April 4, 2023 21:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants