Some Weka-based tools written in Jython
Python Shell
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
data
README.markdown
arff.py
combine_classes.py
csv.py
explore_unimelb.py
find_best_attributes.py
find_correlations.py
find_duplicate_attributes.py
find_soybean_split.bat
find_soybean_split.sh
ga.py
get_attribute_subset.py
misc.py
preprocess_soybeans.py
remove_attributes.py
setup.bat
setup.sh
split_data.py
unimelb_find_correlations.bat
unimelb_train.bat
weka_classifiers.py

README.markdown

Set of Jython tools to perform data mining tasks using Weka

http://bit.ly/weka_tools

Needs Jython and Weka.

Uses UCI Michalski and Chilausky soybean data set

Originally developed for a class assignment.

Summary

  1. ** setup.bat** Shows how to set up classpath to use WEKA from Jython
  2. preprocess_soybeans.py Pre-processes the soybean data set
  3. find_best_attributes.py Finds subset of attributes that give best classification accuracy for a given algorithm and data set
  4. arff.py Weka .arff file reader and writer
  5. split_data.py Splits a WEKA .arff file to preserve class distribution and maximize or minimize aggregate accuracy of a set of classifiers. Output is 2 WEKA .arff files
  6. *find_soybean_split.bat / find_soybean_split.sh * Shows how to run split_data.py on a pre-processed soybean .arff file

Results are in the data directory.

Example use of split_data.py

The batch/shell file find_soybean_split.bat / find_soybean_split.sh runs split_data.py on soybean-large.data.missing.values.replaced.arff to create the training and test files soybean-large.data.missing.values.replaced.best.train.arff and soybean-large.data.missing.values.replaced.best.test.arff which give the classification results soybean.split.results.txt whose summary is

Classifier Correct (out of 60) Percentage Correct
NaiveBayes 57 95 %
J48 58 96.67 %
BayesNet 59 98.33 %
RandomForest 59 98.33 %
JRip 60 100 %
KStar 60 100 %
SMO 60 100 %
MLP 60 100 %