Permalink
Browse files

importing from rubyforge & cleanup

  • Loading branch information...
0 parents commit 966aa2e69cee8da9e58add08b37f81741c633f8d @igrigorik committed Feb 23, 2009
@@ -0,0 +1,18 @@
+= Decision Tree
+
+A ruby library which implements ID3 (information gain) algorithm for decision tree learning. Currently, continuous and discrete datasets can be learned.
+
+- Discrete model assumes unique labels & can be graphed and converted into a png for visual analysis
+- Continuous looks at all possible values for a variable and iteratively chooses the best threshold between all possible assignments. This results in a binary tree which is partitioned by the threshold at every step. (e.g. temperate > 20C)
+
+== Features
+- ID3 algorithms for continuous and discrete cases, with support for incosistent datasets.
+- Graphviz component to visualize the learned tree (http://rockit.sourceforge.net/subprojects/graphr/)
+- Support for multiple, and symbolic outputs and graphing of continuos trees.
+- Returns default value when no branches are suitable for input
+
+== Implementation
+- Ruleset is a class that trains an ID3Tree with 2/3 of the training data, converts it into a set of rules and prunes the rules with the remaining 1/3 of the training data (in a C4.5 way).
+- Bagging is a bagging-based trainer (quite obvious), which trains 10 Ruleset trainers and when predicting chooses the best output based on voting.
+
+Blog post with explanation & examples: http://www.igvita.com/2007/04/16/decision-tree-learning-in-ruby/
@@ -0,0 +1,33 @@
+require 'rubygems'
+require 'decisiontree'
+include DecisionTree
+
+# ---Continuous-----------------------------------------------------------------------------------------
+
+# Read in the training data
+training, attributes = [], nil
+File.open('data/continuous-training.txt','r').each_line { |line|
+ data = line.strip.chomp('.').split(',')
+ attributes ||= data
+ training.push(data.collect {|v| (v == 'healthy') || (v == 'colic') ? (v == 'healthy' ? 1 : 0) : v.to_f})
+}
+
+# Remove the attribute row from the training data
+training.shift
+
+# Instantiate the tree, and train it based on the data (set default to '1')
+dec_tree = ID3Tree.new(attributes, training, 1, :continuous)
+dec_tree.train
+
+#---- Test the tree....
+
+# Read in the test cases
+# Note: omit the attribute line (first line), we know the labels from the training data
+test = []
+File.open('data/continuous-test.txt','r').each_line { |line|
+ data = line.strip.chomp('.').split(',')
+ test.push(data.collect {|v| (v == 'healthy') || (v == 'colic') ? (v == 'healthy' ? 1 : 0) : v.to_f})
+}
+
+# Let the tree predict the output and compare it to the true specified value
+test.each { |t| predict = dec_tree.predict(t); puts "Predict: #{predict} ... True: #{t.last}"}
@@ -0,0 +1,13 @@
+4.60000,139.00000,101.00000,28.80000,7.64000,13.80000,265.06000,1.50000,0.60000,60.00000,12.00000,40.00000,40.00000,3.52393,0.20000,17.61965,healthy.
+4.30000,139.00000,101.00000,26.20000,3.61000,16.10000,518.74103,1.90000,0.01000,68.00000,12.00000,38.00000,36.00000,5.70834,0.20000,28.54170,healthy.
+4.20000,139.00000,101.00000,29.20000,4.96000,13.00000,265.06000,2.10000,0.50000,62.00000,12.00000,39.00000,44.00000,3.44906,0.20000,17.24530,healthy.
+4.40000,141.00000,103.00000,28.30000,12.65000,14.10000,197.60699,2.20000,0.10000,66.00000,12.00000,32.00000,44.00000,3.30135,0.20000,16.50675,healthy.
+4.50000,136.00000,101.00000,26.10000,3.27000,13.40000,300.61499,1.40000,0.01000,68.00000,16.00000,33.00000,50.00000,6.94524,0.70000,9.92177,healthy.
+4.30000,151.00000,112.00000,21.90000,42.66000,21.40000,613.52301,11.50000,172.89999,68.00000,26.00000,63.00000,92.00000,2.69917,0.50000,5.39834,colic.
+3.00000,145.00000,103.00000,22.30000,83.93000,22.70000,476.97101,43.40000,139.50000,86.00000,60.00000,67.00000,68.00000,2.73668,0.20000,13.68340,colic.
+3.40000,134.00000,98.00000,25.90000,90.15000,13.50000,265.06000,2.10000,1.30000,66.00000,20.00000,40.00000,52.00000,3.13565,0.50000,6.27130,colic.
+2.90000,136.00000,92.00000,34.70000,5.81000,12.20000,243.71800,4.20000,22.80000,61.00000,20.00000,41.00000,48.00000,3.20928,0.20000,16.04640,colic.
+3.80000,140.00000,99.00000,28.20000,88.92000,16.60000,695.82800,7.00000,2.60000,60.00000,28.00000,49.00000,80.00000,1.67106,0.50000,3.34212,colic.
+3.70000,143.00000,105.00000,21.60000,93.67000,20.10000,265.06000,4.60000,38.80000,68.00000,16.00000,43.00000,48.00000,3.51757,0.50000,7.03514,colic.
+3.70000,142.00000,103.00000,27.00000,100.24000,15.70000,386.71301,2.30000,0.01000,85.00000,40.00000,45.00000,48.00000,2.81077,0.50000,5.62154,colic.
+3.20000,138.00000,99.00000,29.80000,80.77000,12.40000,224.11301,2.30000,3.90000,61.00000,24.00000,37.00000,40.00000,3.32568,0.50000,6.65136,colic.
Oops, something went wrong.

0 comments on commit 966aa2e

Please sign in to comment.