A project to visualize classifications in a decision tree.
A script that takes a Weka J48 decision tree, and an ARFF file input, and builds a graphical representation of which tree node each instance in the ARFF was assigned to. Here's what it looks like:
Those are plots of the sample data. Both result from a random selection from the same corpus, so they look very similar.
Lets say you've got a training and a test set, and you built the decision tree off of the training set. Now you want to show that the decision tree is used similarly with both training and test set. You could give your reader a bunch of numbers, or you could give her a cool graph. Cool graphs are cooler.
The training and test set should probably result in similar decision tree use if:
- they were selected randomly from a larger corpus
- the tree isn't over-specialized (can't remember the right word right now, you know what I mean)
If you want to try it out, right now it's only outputting the numbers that will be transformed into a graphical output. There are some sample files included, and they should be all that you need.
j48Vis.py *SampleARFF.arff sampleDecisionTree
Python3. It'd probably work in python2.7 with only a bit of modification.
I'm using GPL 2 right now. Why not.