DTree-ID3

Description of program

This Decision Tree program implementing the Iterative Dichotomiser 3 (id3) algorithm is written in Java using object oriented concepts.

The reader Class

The reader class reads csv files as input and produces a two dimensional String array as output. It implements classes from the java.io. package to read the csv file line by line and the java.util.StringTokenizer class to break the data elements in the line by the comma delimiter. With a loop, an array is created for each csv line as an array element, each array element is another array with the delimited line items i.e. the rows attributes. The reader also has a method to convert the 2D String array to a 2D ArrayList. An ArrayList inherits the List and Collection interfaces and has additional algorithms for tasks such as searching, sorting, counting the frequency of values and copying elements of the array.

The DataPoint Class

The DataPoint class is used for a creating a datapoint with all its attribute values. i.e. a row in the dataset. The class has one attribute called attributes of ArrayList type that is created by the constructor. The following code returns the value for the first attribute of the datapoint. DataPoint point = new DataPoint(); point.attributes.get(0)

The id3 Class

getSubset method

returns 2D ArrayList of a subset of data based on the value of a domain (column). Achieves this by looping through the dataset, creates a DataPoint object for each row, adds the DataPoint’s ArrayList to the subset ArrayList container if the value of the domain equals the parameter.

getSymbolValue method

goes through the attribute ArrayList of the 2D domain ArrayList for a value of the domain. If the value does not exist it is added otherwise continues such that the domain ArrayList records unique values only.

getAllValues method

goes through the values of a column in a dataset and appends the unique values in an ArrayList.

calculateEntropy method

purely calculates entropy for subset received and return the results. It is called to calculate Entropy(S) and Entropy(S_v).

decomposeNode method

figures the node which gives the best split base upon a measure of the quality of the split. Information gain is calculated in this method. However, data is split into subset by “getSuset” method. After the first split, data is separated into a number of values belonging to parent class. Then, it iterate itself to find the next best attribute. The iteration process continue until entropy is equal to zero or no more data to split.

alreadyUsedToDecompose method

check if node alrady used or not yet by checking whether tree node of that object has children value assigned and whether decompositionAttribute is equal to the current considering attribute. Othewise, repletes the iteration sending parent object and traverse back.

createDecisionTree method

has two proposes. Fist, call decomposeNode method to build the tree. Second, evoke printTree to print tree.

printTree method

is a method to use to print out the output kept in objects of decision tree program.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
Description of program.docx		Description of program.docx
HW1main.java		HW1main.java
README.md		README.md
discreteAtt.java		discreteAtt.java
entropy.java		entropy.java
new 1.java		new 1.java
node.java		node.java
plot1.R		plot1.R
reader.java		reader.java
sodoCodeForInfoGain.txt		sodoCodeForInfoGain.txt
tree.java		tree.java

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DTree-ID3

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DTree-ID3

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages