Skip to content
forked from chrisspen/dtree

A simple pure-Python decision tree construction algorithm

License

Notifications You must be signed in to change notification settings

pombredanne/dtree

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dtree - A simple pure-Python decision tree construction algorithm

Overview

Given a training data set, it constructs a decision tree for classification or regression in a single batch or incrementally.

It loads data from CSV files. It expects the first row in the CSV to be a header, with each element conforming to the pattern "name:type:mode". Mode is optional, and denotes the class attribute. Type identifies the attribute as either a continuous, discrete, or nominal.

The module is loosely based on code published by Christopher Roach in his article Building Decision Trees in Python. I refactored his code to be more object-oriented, and extended it to support basic regression.

The class attribute can be either continuous, discrete or nominal, but all other attributes can only be discrete or nominal.

Installation

Download the code and then run:

python setup.py build
sudo python setup.py install

You can also install from PyPI using pip via:

sudo pip install dtree

Or upgrade from an earlier version via:

sudo pip install --upgrade dtree

Usage

Classification and regression are handled through the same interface, and differ only in the object returned by the predict() method and how the result from test() is interpreted.

With classification, this object will always be a DDist instance, representing a probability distribution over a set of discrete or nominal classes. In this case, the result from test() will be a CDist instance representing the classification accuracy.

With regression, this object will always be a CDist instance, representing a mean and variance. In this case, the result from test() will be a CDist instance representing the mean absolute error.

from dtree import Tree, Data

tree = Tree.build(Data('classification-training.csv'))
result = t.test(Data('classification-testing.csv'))
print 'Accuracy:',result.mean
prediction = tree.predict(dict(feature1=123, feature2='abc', feature3='hot'))
print 'best:',prediction.best
print 'probs:',prediction.probs

tree = Tree.build(Data('regression-training.csv'))
result = t.test(Data('regression-testing.csv'))
print 'MAE:',result.mean
prediction = tree.predict(dict(feature1=123, feature2='abc', feature3='hot'))
print 'mean:',prediction.mean
print 'variance:',prediction.variance

Features

  • building a classification or regression tree using batch or incremental/online methods

Todo

Does not yet support:

  • sparse training data
  • sparse query vector

History

0.1.0 - 2012.01.24 Initial development.

0.2.0 - 2012.02.08 Refactored to support incremental/online tree construction and forests.

About

A simple pure-Python decision tree construction algorithm

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 98.8%
  • Perl 1.2%