Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Local Adaptive Streaming Tree #1610

Merged
merged 20 commits into from
Sep 6, 2024
Merged

Conversation

danielnowakassis
Copy link
Contributor

Hi.

I've implemented my decision tree in River and run all the tests for adding a new estimator.

The published paper is here: https://dl.acm.org/doi/10.1145/3605098.3635899

We proposed an adaptive splitting mechanism, where a change detector monitors the error rate or data distribution purity (parameter) of the leaf node to determine the split point (and if merit > 0). It had some good results against SOTA decision trees, and I have also tested it in real datasets in River.

Accuracy :

accuracy

I'm open to any questions. LAST would be error rate monitoring, LAST_D would be data distribution purity monitoring.

Copy link
Member

@smastelini smastelini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @danielnowakassis, good stuff! I finished a first pass in the code and left some suggestions/comments :D

I really like the idea of using change detectors to decide when to split, rather than making checks at predefined intervals.

My main suggestion is leveraging existing functionalities via class inheritance. I think that if you make your class inherit from tree.HoeffdingTreeClassifier you could remove a lot of repeated code and only focus on the parts that really change from one type of tree to another.

river/tree/nodes/last_nodes.py Outdated Show resolved Hide resolved
river/tree/nodes/last_nodes.py Outdated Show resolved Hide resolved
river/tree/last_classifier.py Outdated Show resolved Hide resolved
river/tree/last_classifier.py Outdated Show resolved Hide resolved
river/tree/last_classifier.py Show resolved Hide resolved
river/tree/last_classifier.py Show resolved Hide resolved
@danielnowakassis
Copy link
Contributor Author

Hi @danielnowakassis, good stuff! I finished a first pass in the code and left some suggestions/comments :D

I really like the idea of using change detectors to decide when to split, rather than making checks at predefined intervals.

My main suggestion is leveraging existing functionalities via class inheritance. I think that if you make your class inherit from tree.HoeffdingTreeClassifier you could remove a lot of repeated code and only focus on the parts that really change from one type of tree to another.

Hi @danielnowakassis, good stuff! I finished a first pass in the code and left some suggestions/comments :D

I really like the idea of using change detectors to decide when to split, rather than making checks at predefined intervals.

My main suggestion is leveraging existing functionalities via class inheritance. I think that if you make your class inherit from tree.HoeffdingTreeClassifier you could remove a lot of repeated code and only focus on the parts that really change from one type of tree to another.

Hi Saulo,

Thank you for your fast and good review.

I agree that some parts of the code can be inherited by other classes. I will put some effort into this. LAST replaces the parameters grace period, tau threshold, and confidence level by change detector and track error. I will leave them as None so.

I'm glad that you liked my idea. I will send you the pdf

Copy link
Member

@smastelini smastelini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @danielnowakassis, thanks for the changes already in place!

I left some additional comments in your code and noticed some problems with the tests:

  • you need to run the pre-commit actions locally. It seems ruff is modifying some formating of the code in the CI, which makes the code quality tests fail
  • You will also need to implement current_merit for the remaining split criteria, as these classes cannot be currently instantiated with an abstract method (please check the tests logs for more details).

Also, please do not forget to add an entry in the release notes (docs/releases/unreleased.md) with your contributions (including the changes in arff files handling)

river/stream/iter_arff.py Show resolved Hide resolved
river/tree/last_classifier.py Outdated Show resolved Hide resolved
river/tree/last_classifier.py Outdated Show resolved Hide resolved
river/tree/nodes/last_nodes.py Outdated Show resolved Hide resolved
river/tree/nodes/last_nodes.py Show resolved Hide resolved
river/tree/nodes/last_nodes.py Outdated Show resolved Hide resolved
river/tree/nodes/last_nodes.py Outdated Show resolved Hide resolved
river/tree/last_classifier.py Outdated Show resolved Hide resolved
Copy link
Member

@smastelini smastelini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for the contribution, @danielnowakassis. I really like the idea behind LAST and I am eager to see the next steps of this theoretical framework.

As a note for posterity: @danielnowakassis and I have identified a possible way to improve the tree module via some refactoring, much akin to what was done to the tree splitters a long time ago. The impacts of these changes are yet to be measured. An issue will be open to track this effort.

@smastelini smastelini merged commit e3b2163 into online-ml:main Sep 6, 2024
4 checks passed
@danielnowakassis
Copy link
Contributor Author

Thank you so much for the contribution, @danielnowakassis. I really like the idea behind LAST and I am eager to see the next steps of this theoretical framework.

As a note for posterity: @danielnowakassis and I have identified a possible way to improve the tree module via some refactoring, much akin to what was done to the tree splitters a long time ago. The impacts of these changes are yet to be measured. An issue will be open to track this effort.

Thank you, @smastelini. We did a great job :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants