Skip to content

Commit

Permalink
importing from rubyforge & cleanup
Browse files Browse the repository at this point in the history
  • Loading branch information
igrigorik committed Feb 23, 2009
0 parents commit 966aa2e
Show file tree
Hide file tree
Showing 12 changed files with 635 additions and 0 deletions.
18 changes: 18 additions & 0 deletions README.rdoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
= Decision Tree

A ruby library which implements ID3 (information gain) algorithm for decision tree learning. Currently, continuous and discrete datasets can be learned.

- Discrete model assumes unique labels & can be graphed and converted into a png for visual analysis
- Continuous looks at all possible values for a variable and iteratively chooses the best threshold between all possible assignments. This results in a binary tree which is partitioned by the threshold at every step. (e.g. temperate > 20C)

== Features
- ID3 algorithms for continuous and discrete cases, with support for incosistent datasets.
- Graphviz component to visualize the learned tree (http://rockit.sourceforge.net/subprojects/graphr/)
- Support for multiple, and symbolic outputs and graphing of continuos trees.
- Returns default value when no branches are suitable for input

== Implementation
- Ruleset is a class that trains an ID3Tree with 2/3 of the training data, converts it into a set of rules and prunes the rules with the remaining 1/3 of the training data (in a C4.5 way).
- Bagging is a bagging-based trainer (quite obvious), which trains 10 Ruleset trainers and when predicting chooses the best output based on voting.

Blog post with explanation & examples: http://www.igvita.com/2007/04/16/decision-tree-learning-in-ruby/
33 changes: 33 additions & 0 deletions examples/continuous-id3.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
require 'rubygems'
require 'decisiontree'
include DecisionTree

# ---Continuous-----------------------------------------------------------------------------------------

# Read in the training data
training, attributes = [], nil
File.open('data/continuous-training.txt','r').each_line { |line|
data = line.strip.chomp('.').split(',')
attributes ||= data
training.push(data.collect {|v| (v == 'healthy') || (v == 'colic') ? (v == 'healthy' ? 1 : 0) : v.to_f})
}

# Remove the attribute row from the training data
training.shift

# Instantiate the tree, and train it based on the data (set default to '1')
dec_tree = ID3Tree.new(attributes, training, 1, :continuous)
dec_tree.train

#---- Test the tree....

# Read in the test cases
# Note: omit the attribute line (first line), we know the labels from the training data
test = []
File.open('data/continuous-test.txt','r').each_line { |line|
data = line.strip.chomp('.').split(',')
test.push(data.collect {|v| (v == 'healthy') || (v == 'colic') ? (v == 'healthy' ? 1 : 0) : v.to_f})
}

# Let the tree predict the output and compare it to the true specified value
test.each { |t| predict = dec_tree.predict(t); puts "Predict: #{predict} ... True: #{t.last}"}
13 changes: 13 additions & 0 deletions examples/data/continuous-test.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
4.60000,139.00000,101.00000,28.80000,7.64000,13.80000,265.06000,1.50000,0.60000,60.00000,12.00000,40.00000,40.00000,3.52393,0.20000,17.61965,healthy.
4.30000,139.00000,101.00000,26.20000,3.61000,16.10000,518.74103,1.90000,0.01000,68.00000,12.00000,38.00000,36.00000,5.70834,0.20000,28.54170,healthy.
4.20000,139.00000,101.00000,29.20000,4.96000,13.00000,265.06000,2.10000,0.50000,62.00000,12.00000,39.00000,44.00000,3.44906,0.20000,17.24530,healthy.
4.40000,141.00000,103.00000,28.30000,12.65000,14.10000,197.60699,2.20000,0.10000,66.00000,12.00000,32.00000,44.00000,3.30135,0.20000,16.50675,healthy.
4.50000,136.00000,101.00000,26.10000,3.27000,13.40000,300.61499,1.40000,0.01000,68.00000,16.00000,33.00000,50.00000,6.94524,0.70000,9.92177,healthy.
4.30000,151.00000,112.00000,21.90000,42.66000,21.40000,613.52301,11.50000,172.89999,68.00000,26.00000,63.00000,92.00000,2.69917,0.50000,5.39834,colic.
3.00000,145.00000,103.00000,22.30000,83.93000,22.70000,476.97101,43.40000,139.50000,86.00000,60.00000,67.00000,68.00000,2.73668,0.20000,13.68340,colic.
3.40000,134.00000,98.00000,25.90000,90.15000,13.50000,265.06000,2.10000,1.30000,66.00000,20.00000,40.00000,52.00000,3.13565,0.50000,6.27130,colic.
2.90000,136.00000,92.00000,34.70000,5.81000,12.20000,243.71800,4.20000,22.80000,61.00000,20.00000,41.00000,48.00000,3.20928,0.20000,16.04640,colic.
3.80000,140.00000,99.00000,28.20000,88.92000,16.60000,695.82800,7.00000,2.60000,60.00000,28.00000,49.00000,80.00000,1.67106,0.50000,3.34212,colic.
3.70000,143.00000,105.00000,21.60000,93.67000,20.10000,265.06000,4.60000,38.80000,68.00000,16.00000,43.00000,48.00000,3.51757,0.50000,7.03514,colic.
3.70000,142.00000,103.00000,27.00000,100.24000,15.70000,386.71301,2.30000,0.01000,85.00000,40.00000,45.00000,48.00000,2.81077,0.50000,5.62154,colic.
3.20000,138.00000,99.00000,29.80000,80.77000,12.40000,224.11301,2.30000,3.90000,61.00000,24.00000,37.00000,40.00000,3.32568,0.50000,6.65136,colic.
Loading

0 comments on commit 966aa2e

Please sign in to comment.