##Dependencies
- numpy
- matplotlib
- pillow
- scipy
The source code can be run under windows or linux with python 2.7+ and the libraries above.
##Directory structure
./
├── doc
│ └── report.pdf (the report)
├── asset
│ ├── dataset.txt (data set)
│ ├── learning-curve.png (learning curve plot)
│ ├── pruned-tree.png (visualization of the pruned decision tree)
│ └── tree.png (visualization of the decision tree)
└── src
├── main.py (generate the learning curve)
├── preprocess.py (split the dataset and store them in json)
├── tree.py (for building, pruning, drawing decision trees)
└── util.py (directory structure configurations)
##How to generate the results
Note: python scripts should be run under the src
directory. All images will be placed under the asset
directory.
- Make sure the data set
dataset.txt
is placed underasset
- To sample from the dataset, run
python preprocess.py
undersrc
. The training set and the test set will be stored astraining.json
andtesting.json
underasset
. The default sampling probability of the test set is 0.5. If you need to change it, for example, to 0.3, runpython preprocess.py -p 0.3
. - To plot the decision tree built with
training.json
andtesting.json
generated withpreprocess.py
, runpython tree.py
. The plots will be placed underasset
namedtree.png
andpruned-tree.png
- To generate the learning curve, run
python main.py
. The plot will be placed underasset
namedlearning-curve.png
##About
- Github repository
- Time: Dec. 2014