Skip to content

Latest commit

 

History

History
55 lines (46 loc) · 1.64 KB

README.md

File metadata and controls

55 lines (46 loc) · 1.64 KB

Decision Tree Classifier

Build Status Coverage Status

Usage

For the following dataset:

feature 0 feature 1 feature 2 label
1 1 1 0
1 1 0 1
0 0 1 1
1 1 0 0
1 0 0 1
training_data = [
    [1, 1, 1, 0],
    [1, 1, 0, 1],
    [0, 0, 1, 1],
    [1, 1, 0, 0],
    [1, 0, 0, 1]
]
design_matrix = [row[:-1] for row in training_data]
target_values = [row[-1] for row in training_data]
decision_tree = DecisionTree()
decision_tree.fit(design_matrix, target_values)
predictions = decision_tree.predict(design_matrix)

To print the tree:

decision_tree.print()

Which would output the following:

Is feature 1 >= 1?
--> True:
  Is feature 2 >= 1?
  --> True:
    Predict {0: '100.0%'}
  --> False:
    Predict {1: '50.0%', 0: '50.0%'}
--> False:
  Predict {1: '100.0%'}

In the case of ambiguous records like [1, 1, 0] where two records exist with the same feature values, but different labels, the tree always predicts the first key in the prediction dictionary or 1 in this example.

Adapted from:

Let's Write a Decision Tree Classifier from Scratch - Machine Learning Recipes #8