Skip to content

Java Implementation for Multi-Class Decision Tree Machine Learning Algorithm training with large files

Notifications You must be signed in to change notification settings

mostafacs/DecisionTree

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 

Repository files navigation

Decision Trees

Decision Tree Classifier, which is a simple yet widely used classification technique. More Information

What's Classification ?

Classification, which is the task of assigning objects to one of several predefined categories

Implementation

This Implementation is Optimized for very large DataSets . First , Learning File must be in PSV ( Pipe Separated Values ). There is example file already exist in resources/vertebrate.psv First Column not used as attribute just work as entity label and Last Column work as entity Class Type of data also first row work as attributes names .

There is python script to generate psv file from MySQL database "generatePSVFromTables.py" You need install dependencies to use script

pip install MySQLdb
then run :
python generatePSVFromTables.py

you need to edit this script to define your connection parameters :

host = "localhost"
user = "root"
password = "root"
database = "myinfo"

also you must change the sql Query with your Sql Statement :

sql = 'select e.id as "Employee Number" , e.first_name as "First Name" , e.last_name as "Last Name", e.haschildren as "Has Children" , d.department_name as "Department Name" from employees e,department d where e.department_no = d.id;'

Note - You Must add Target Class for each row after this or Edit Script to add your class based on data

Usage

	DecisionTree tree = new DecisionTree();
	// Train your Decision Tree
	tree.train(new File("resources/vertebrate.psv"));
	// Print RootNode display xml structure from your decision tree learning
	System.out.println(tree.getRootNode());
	// Classify your new data
	System.out.println(tree.classify("gila monster|cold-blooded|scales|no|no|no|yes|yes"));

Requirements

Java 8

Next Features

1- Maximum Split Feature .
2- Support Splitting of Continuous Attributes .
3- Create Script to make .psv file from BigData (Hadoop ,...) .
4- Parallel Files Processing "Support learning from many files on different machine" .

About

Java Implementation for Multi-Class Decision Tree Machine Learning Algorithm training with large files

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published