Set of Jython tools to perform data mining tasks using Weka

Needs Jython and Weka.

Uses UCI Michalski and Chilausky soybean data set

Originally developed for a class assignment.


  1. ** setup.bat** Shows how to set up classpath to use WEKA from Jython
  2. Pre-processes the soybean data set
  3. Finds subset of attributes that give best classification accuracy for a given algorithm and data set
  4. Weka .arff file reader and writer
  5. Splits a WEKA .arff file to preserve class distribution and maximize or minimize aggregate accuracy of a set of classifiers. Output is 2 WEKA .arff files
  6. *find_soybean_split.bat / * Shows how to run on a pre-processed soybean .arff file

Results are in the data directory.

Example use of

The batch/shell file find_soybean_split.bat / runs on to create the training and test files and which give the classification results soybean.split.results.txt whose summary is

Classifier Correct (out of 60) Percentage Correct
NaiveBayes 57 95 %
J48 58 96.67 %
BayesNet 59 98.33 %
RandomForest 59 98.33 %
JRip 60 100 %
KStar 60 100 %
SMO 60 100 %
MLP 60 100 %