Adds Extremely Randomized Trees Algorithm and Min Impurity Decrease #51

samanthacampo · 2020-09-29T19:50:09Z

Description

Adds Extremely Randomized Trees Algorithm and Min Impurity Decrease.

Motivation

Adds Extremely Randomized Trees Algorithm and Min Impurity Decrease.

Paper reference

https://link.springer.com/article/10.1007/s10994-006-6226-1

…ll resulting issues. Get all tests running.

…de is pure.

Craigacp

Small changes, mostly formatting and javadoc. I would like consistent use of "features" rather than "attributes" as we don't use "attributes" to mean features anywhere else in the codebase.

Craigacp · 2020-09-30T19:25:28Z

.../DecisionTree/src/main/java/org/tribuo/classification/dtree/impl/ClassifierTrainingNode.java

+
+        lessThanOrEqual = new ClassifierTrainingNode(impurity, lessThanData, lessThanIndices.size, depth + 1, featureIDMap, labelIDMap);
+        greaterThan = new ClassifierTrainingNode(impurity, greaterThanData, numExamples - lessThanIndices.size, depth + 1, featureIDMap, labelIDMap);
+        List<AbstractTrainingNode<Label>> output = new ArrayList<>();


I should have sized this arraylist to 2 originally, but now you've moved it we should definitely do that. Ditto for the other places in Regression which create a small array. When we move up from Java 8 we can replace it with a List.of() which will be better.

Craigacp · 2020-09-30T19:27:32Z

.../DecisionTree/src/main/java/org/tribuo/classification/dtree/impl/ClassifierTrainingNode.java

-    public double getImpurity() {
-        return impurity.impurity(labelCounts);
-    }
+    public double getImpurity() { return impurityScore;}


This shouldn't be on a single line.

Craigacp · 2020-09-30T19:31:42Z

...ation/DecisionTree/src/main/java/org/tribuo/classification/dtree/impurity/LabelImpurity.java

@@ -75,7 +75,8 @@ default public double impurity(double[] input) {
    }

    /**
-     * Calculates the impurity assuming the input are weighted counts, normalizing by their sum.
+     * Calculates the impurity assuming the input are weighted counts, normalizing by their sum. The resulting


This javadoc isn't quite right. The counts are assumed to be weighted, they are converted into a probability distribution by dividing by their sum, and then the impurity is multiplied by the sum. It's missing the "probability distribution" bit.

Craigacp · 2020-09-30T19:33:25Z

Classification/DecisionTree/src/test/java/org/tribuo/classification/dtree/TestCART.java


-    public void testCART(Pair<Dataset<Label>,Dataset<Label>> p) {
-        TreeModel<Label> m = t.train(p.getA());
+    public void testCART(Pair<Dataset<Label>,Dataset<Label>> p, AbstractCARTTrainer<Label> trainer) {


These should be sharply typed (i.e. CARTClassificationTrainer not AbstractCARTTrainer<Label>). I'd prefer nobody ever use the AbstractCARTTrainer type in user code, so we shouldn't do it in the tests unless it's strictly necessary.

Craigacp · 2020-09-30T19:33:58Z

Classification/DecisionTree/src/test/java/org/tribuo/classification/dtree/TestCART.java


 public class TestCART {

    private static final CARTClassificationTrainer t = new CARTClassificationTrainer();
+    private static final CARTClassificationTrainer randomt = new CARTClassificationTrainer(5,     2, 0.0f,1.0f, true,


Looks like there's some random whitespace in this line?

Craigacp · 2020-09-30T19:53:40Z

Regression/RegressionTree/src/main/java/org/tribuo/regression/rtree/CARTRegressionTrainer.java

                        (node.getNumExamples() > minChildWeight)) {
                    if (numFeaturesInSplit != featureIDMap.size()) {
                        Util.randpermInPlace(originalIndices, localRNG);
                        System.arraycopy(originalIndices, 0, indices, 0, numFeaturesInSplit);
                    }
-                    List<AbstractTrainingNode<Regressor>> nodes = node.buildTree(indices);
+                    List<AbstractTrainingNode<Regressor>> nodes = node.buildTree(indices, localRNG,
+                            getUseRandomSplitPoints(),getMinImpurityDecrease() * weightSum);


Maybe precompute getMinImpurityDecrease()*weightSum rather than do it every time?

Craigacp · 2020-09-30T19:54:58Z

...egressionTree/src/main/java/org/tribuo/regression/rtree/impl/JointRegressorTrainingNode.java

    }

    @Override
-    public double getImpurity() {
+    public double getImpurity() { return impurityScore;}


Formatting.

Craigacp · 2020-09-30T19:55:17Z

...egressionTree/src/main/java/org/tribuo/regression/rtree/impl/JointRegressorTrainingNode.java

+     * Calculates the impurity score of the node.
+     * @return the impurity score of the node.
+     */
+    private double calcImpurity(){


Put a space between () and the open curly brace.

Craigacp · 2020-09-30T20:06:32Z

...RegressionTree/src/test/java/org/tribuo/regression/rtree/TestCARTJointRegressionTrainer.java


-    public void testJointRegressionTree(Pair<Dataset<Regressor>,Dataset<Regressor>> p) {
-        TreeModel<Regressor> m = t.train(p.getA());
+    public void testJointRegressionTree(Pair<Dataset<Regressor>,Dataset<Regressor>> p, AbstractCARTTrainer<Regressor> trainer) {


Similar to the classification tests, I'd prefer it if the sharp CARTJointRegressionTrainer is used rather than AbstractCARTTrainer<Regressor> unless you're sharing the tests across both types of regression tree trainer.

Craigacp · 2020-09-30T20:07:00Z

...sion/RegressionTree/src/test/java/org/tribuo/regression/rtree/TestCARTRegressionTrainer.java

-    public void testIndependentRegressionTree(Pair<Dataset<Regressor>,Dataset<Regressor>> p) {
-        Model<Regressor> m = t.train(p.getA());
+    public void testIndependentRegressionTree(Pair<Dataset<Regressor>,Dataset<Regressor>> p,
+                                              AbstractCARTTrainer<Regressor> trainer) {


Sharp type.

Craigacp

Three tiny changes to clean things up. Looks good otherwise.

Craigacp · 2020-10-02T13:36:23Z

Common/Trees/src/main/java/org/tribuo/common/tree/AbstractCARTTrainer.java

@@ -126,8 +126,8 @@ public synchronized void postConfig() {
            throw new IllegalArgumentException("maxDepth must be greater than or equal to 1");
        }

-        if ((minChildWeight < 0.0f)) {
-            throw new IllegalArgumentException("minChildWeight must be greater than or equal to 0");
+        if ((minChildWeight <= 0.0f)) {


There are two sets of parentheses here.

Craigacp · 2020-10-02T13:41:22Z

Regression/RegressionTree/src/main/java/org/tribuo/regression/rtree/TrainTest.java

@@ -121,20 +124,20 @@ public static void main(String[] args) throws IOException {
        SparseTrainer<Regressor> trainer;
        switch (o.treeType) {
            case CART_INDEPENDENT:
-                if (o.fraction <= 0) {
-                    trainer = new CARTRegressionTrainer(o.depth,o.minChildWeight,0.0f, 1, false, impurity,
+                if (o.fraction == 0) {


It's probably better to fix the default value of fraction to be 1.0, and then remove this if clause entirely.

Craigacp · 2020-10-02T13:42:16Z

...on/DecisionTree/src/main/java/org/tribuo/classification/dtree/CARTClassificationOptions.java

@@ -69,10 +73,12 @@ public CARTClassificationTrainer getTrainer() {
        CARTClassificationTrainer trainer;
        switch (cartTreeAlgorithm) {
            case CART:
-                if (cartSplitFraction <= 0) {
-                    trainer = new CARTClassificationTrainer(cartMaxDepth, cartMinChildWeight, 1, impurity, cartSeed);
+                if (cartSplitFraction == 0) {


Probably best to set the default value of cartSplitFraction to 1.0 and then remove this if statement entirely.

Craigacp

Looks good, thanks.

samanthacampo added 17 commits September 3, 2020 16:51

Add in random splitting param and config in various files

f989948

Continue implementing

18954cc

Alter build_tree to take number generator and choose random split

9532117

Passing all but sparse data on classification.

2187d0d

Classifier passes all tests. Fixed case where attr only has one value.

df74800

For regression: Add random split param. Add rng build_tree arg. Fix a…

3ea27e4

…ll resulting issues. Get all tests running.

Finish first draft of regression, both joint and regular

9a7fd48

Complete writing and passing regression tests

5d9e7ee

Complete writing and passing classification tests

d5cd73c

Separate classes version complete.

bf18b89

Cache impurityScore in node constructor. Don't calculate splits if no…

f4ca279

…de is pure.

Refactor Random Nodes into regular nodes.

a957471

Change to using public accessor for getImpurity

8928853

Start adding the minImpurityDecrease param in and add checks of options.

f0660c3

First draft complete of minImpurityDecrease

b5a78e0

Fix minor details of minImpurityDecrease changes

bb97e8e

Merge branch 'main' into samanthacampo/extreme_trees

7b23d1d

Craigacp requested changes Sep 30, 2020

View reviewed changes

samanthacampo changed the title ~~Samanthacampo/extreme trees~~ Adds Extremely Randomized Trees Algorithm and Min Impurity Decrease Sep 30, 2020

samanthacampo added 3 commits September 30, 2020 17:55

Start making PR corrections

c277f69

Complete all PR changes

07ef23f

Final PR fixes

4445b98

Craigacp requested changes Oct 2, 2020

View reviewed changes

Second set of PR fixes

8382267

Craigacp approved these changes Oct 2, 2020

View reviewed changes

Craigacp merged commit f072c2c into oracle:main Oct 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds Extremely Randomized Trees Algorithm and Min Impurity Decrease #51

Adds Extremely Randomized Trees Algorithm and Min Impurity Decrease #51

samanthacampo commented Sep 29, 2020 •

edited

Craigacp left a comment

Craigacp Sep 30, 2020

Craigacp Sep 30, 2020

Craigacp Sep 30, 2020

Craigacp Sep 30, 2020

Craigacp Sep 30, 2020

Craigacp Sep 30, 2020

Craigacp Sep 30, 2020

Craigacp Sep 30, 2020

Craigacp Sep 30, 2020

Craigacp Sep 30, 2020

Craigacp left a comment

Craigacp Oct 2, 2020

samanthacampo Oct 2, 2020

Craigacp Oct 2, 2020

samanthacampo Oct 2, 2020

Craigacp Oct 2, 2020

samanthacampo Oct 2, 2020

Craigacp left a comment

Adds Extremely Randomized Trees Algorithm and Min Impurity Decrease #51

Adds Extremely Randomized Trees Algorithm and Min Impurity Decrease #51

Conversation

samanthacampo commented Sep 29, 2020 • edited

Description

Motivation

Paper reference

Craigacp left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Craigacp left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Craigacp left a comment

Choose a reason for hiding this comment

samanthacampo commented Sep 29, 2020 •

edited