Skip to content

Implementation of decision trees for binary categorical data using numpy. Includes regular decision trees, random forest, and boosted trees.

Notifications You must be signed in to change notification settings

pwc2/decision-trees

Repository files navigation

decision-trees

decision-trees contains implementations of the decision tree classifier, random forest, and boosted trees (with AdaBoost) for binary categorical data.

Requirements:

  • numpy 1.17.3

  • pandas 0.25.2

Usage:

For a single decision tree:

import pandas as pd

from models.decision_tree import DecisionTreeClassifier

train_set = pd.read_csv('data/pa3_train.csv')
validation_set = pd.read_csv('data/pa3_val.csv')
test_set = pd.read_csv('data/pa3_test.csv')

tree = DecisionTreeClassifier(train=train_set, validation=validation_set,test=test_set, 
                                label='class', max_depth=2)
results = tree.train()

For a random forest:

import pandas as pd

from models.random_forest import RandomForestClassifier

train_set = pd.read_csv('data/pa3_train.csv')
validation_set = pd.read_csv('data/pa3_val.csv')
test_set = pd.read_csv('data/pa3_test.csv')

rf = RandomForestClassifier(train=train_set, validation=validation_set, test=test_set,
                             label='class', n_trees=5, n_features=5, seed=1, max_depth=2)
results = rf.train()

For a boosted trees with AdaBoost:

import pandas as pd

from models.adaboost import AdaBoostClassifier

train_set = pd.read_csv('data/pa3_train.csv')
validation_set = pd.read_csv('data/pa3_val.csv')
test_set = pd.read_csv('data/pa3_test.csv')

boosted_trees = AdaBoostClassifier(train=train_set, validation=validation_set, test=test_set,
                                     label='class', n_classifiers=5, max_depth=2)
results = boosted_trees.train()

Data:

The data/ folder contains .csv files with training, validation, and test sets.

To run models:

  • run_part1.py creates decision trees with varied depths.
  • run_part2.py creates random forests with varied parameters.
  • run_part3.py creates boosted trees with varied parameters.

python main.py will run all three parts in order, output will be saved in model_output folder.

Future improvements:

  • Refactor AdaBoostClassifier and RandomForestClassifier classes to inherit attributes from DecisionTreeClassifier class.

About

Implementation of decision trees for binary categorical data using numpy. Includes regular decision trees, random forest, and boosted trees.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published