# Assignment 4: Decision Tree Learning

In this assignment, you will work with a class of reinforcement learning agents called decision trees to attempt to classify features according to some decision boundary.


This assignment is due on T-Square on November 3 by 9:35 AM.

Introduction:
-------

For this assignment we're going to need an explicit way to make structured decisions. The following is a decision node- a class representing some atomic choice in a binary decision graph. It can represent a class label (i.e. a final decision) or a binary decision to guide the us through a flow-chart to arrive at a decision.

In [12]:
class DecisionNode():

    def __init__(self, left, right, decision_function,class_label=None):
        self.left = left
        self.right = right
        self.decision_function = decision_function
        self.class_label = class_label

    def decide(self, feature):
        if self.class_label != None:
            return self.class_label

        return self.left.decide(feature) if self.decision_function(feature) else self.right.decide(feature)


Part 1: Warmup: Building a tree by hand
--------
20 pts.

In the below code block, construct a tree of decision nodes by hand in order to classify the data below. Select tests to be as small as possible (in terms of attributes), breaking ties among tests with the same number of attributes by selecting the one that classifies the greatest number of examples correctly. If multiple tests have the same number of attributes and classift the same number of examples, then break the tie using attributes with lower index numbers (e.g. select $A_1$ over $A_2$)

| Datum  | $A_1$ | $A_2$ | $A_3$ | $A_4$ |  y  |
| -------| :---: | :---: | :---: | :---: | ---:|
| $x_1$  |   1   |   0   |   0   |   0   |  1  |
| $x_2$  |   1   |   0   |   1   |   1   |  1  |
| $x_3$  |   0   |   1   |   0   |   0   |  1  |
| $x_4$  |   0   |   1   |   1   |   0   |  0  |
| $x_5$  |   1   |   1   |   0   |   1   |  1  |
| $x_6$  |   0   |   1   |   0   |   1   |  0  |
| $x_7$  |   0   |   0   |   1   |   1   |  1  |
| $x_8$  |   0   |   0   |   1   |   0   |  0  |


In [13]:
examples = [[1,0,0,0],
            [1,0,1,1],
            [0,1,0,0],
            [0,1,1,0],
            [1,1,0,1],
            [0,1,0,1],
            [0,0,1,1],
            [0,0,1,0]]

classes = [1,1,1,0,1,0,1,0]

# Constructing nodes one at a time,
# build a decision tree as specified above.
# There exists a correct tree with less than 6 nodes.

def A1DFunction(feature):
    if feature[0] == 1:
        return 1
    else:
        return 0
def A2DFunction(feature):
    if feature[1] == 1:
        return 1
    else:
        return 0
def A3DFunction(feature):
    if feature[2] == 1:
        return 1
    else:
        return 0
def A4DFunction(feature):
    if feature[3] == 1:
        return 1
    else:
        return 0

TDNode = DecisionNode(None, None, None, 1)
FDNode = DecisionNode(None, None, None, 0)
A42DNode = DecisionNode(FDNode, TDNode, A4DFunction)
A3DNode = DecisionNode(FDNode, A42DNode, A3DFunction)
A4DNode = DecisionNode(TDNode, FDNode, A4DFunction)
A2DNode = DecisionNode(A3DNode, A4DNode, A2DFunction)

decision_tree_root = DecisionNode(TDNode, A2DNode, A1DFunction)


Part 1b: Validation
--------

Now that we have a decision tree, we're going to need some way to evaluate its performance. In most cases we'd reserve a portion of the training data for evaluation, or use cross validation, bot for now let's just see how your tree does on the provided examples. In the stubbed out code below, fill out the methods to compute accuracy, precision, recall, and the confusion matrix for your classifier output.

In [14]:
def confusion_matrix(classifier_output, true_labels):
    #TODO output should be [[true_positive, false_negative], [false_positive, true_negative]]
    true_positive = 0
    false_positive = 0
    false_negative = 0
    true_negative = 0
    for x in range(0, len(classifier_output)):
        if classifier_output[x] == 1 and true_labels[x] == 1:
            true_positive += 1
        elif classifier_output[x] == 0 and true_labels[x] == 0:
            true_negative += 1
        elif classifier_output[x] == 1 and true_labels[x] == 0:
            false_positive += 1
        elif classifier_output[x] == 0 and true_labels[x] == 1:
            false_negative += 1
    return [[true_positive, false_negative], [false_positive, true_negative]]

def precision(classifier_output, true_labels):
    #TODO precision is measured as: true_positive/ (true_positive + false_positive)
    true_positive = 0
    false_positive = 0
    for x in range(0, len(classifier_output)):
        if classifier_output[x] == 1 and true_labels[x] == 1:
            true_positive += 1
        elif classifier_output[x] == 1 and true_labels[x] == 0:
            false_positive += 1
    if true_positive + false_positive == 0:
        return 0
    return (true_positive/ float(true_positive + false_positive))

def recall(classifier_output, true_labels):
    #TODO: recall is measured as: true_positive/ (true_positive + false_negative)
    true_positive = 0
    false_negative = 0
    for x in range(0, len(classifier_output)):
        if classifier_output[x] == 1 and true_labels[x] == 1:
            true_positive += 1
        elif classifier_output[x] == 0 and true_labels[x] == 1:
            false_negative += 1
    if true_positive + false_negative == 0:
        return 0
    return (true_positive/ float((true_positive + false_negative)))

def accuracy(classifier_output, true_labels):
    #TODO accuracy is measured as:  correct_classifications / total_number_examples
    correct_classifications = 0
    for x in range(0, len(classifier_output)):
        if classifier_output[x] == true_labels[x]:
            correct_classifications += 1
    if true_labels == 0:
        return 0
    return (correct_classifications/float(len(true_labels)))

classifier_output = [decision_tree_root.decide(example) for example in examples]

# Make sure your hand-built tree is 100% accurate.
p1_accuracy = accuracy( classifier_output, classes )
p1_precision = precision(classifier_output, classes)
p1_recall = recall(classifier_output, classes)
p1_confusion_matrix = confusion_matrix(classifier_output, classes)

print p1_accuracy
print p1_precision
print p1_recall
print p1_confusion_matrix


1.0
1.0
1.0
[[5, 0], [0, 3]]


Part 2: Decision Tree Learning
-------
40 pts.

As the number of examples we have grows, it rapidly becomes impractical to build these trees by hand, so it becomes necessary to specify a procedure by which we can automagically construct these trees.

For starters, let's consider the following algorithm (a variation of C4.5) for the construction of a decision tree from a given set of examples:

    1) Check for base cases: 
         a)If all elements of a list are of the same class, return a leaf node with the appropriate class label.
         b)If a specified depth limit is reached, return a leaf labeled with the most frequent class.

    2) For each attribute alpha: evaluate the normalized information gain gained by splitting on alpha

    3) Let alpha_best be the attribute with the highest normalized information gain

    4) Create a decision node that splits on alpha_best

    5) Recur on the sublists obtained by splitting on alpha_best, and add those nodes as children of node

In the \_\_build_tree\__ method below implement the above algorithm. In the "classify" method below, write a function to produce classifications for a list of features once your decision tree has been build.

In [15]:
def entropy(class_vector):
    import math
    # TODO: Compute the Shannon entropy for a vector of classes
    # Note: Classes will be given as either a 0 or a 1.
    p = class_vector.count(1)
    n = class_vector.count(0)
    t = len(class_vector)
    if t == 0:
        return 0
    if (p/float(t)) != 0.0 and (n/float(t)) != 0.0:
        return (-( (p/float(t))*math.log((p/float(t)), 2)  )  -( (n/float(t))*math.log((n/float(t)), 2)  ))
    elif (p/float(t)) != 0.0 and (n/float(t)) == 0.0:
        return (-( (p/float(t))*math.log((p/float(t)), 2)  )  -( 0  ))
    elif (p/float(t)) == 0.0 and (n/float(t)) != 0.0:
        return (-( 0 )  -( (n/float(t))*math.log((n/float(t)), 2)  ))
    else:
        return 0

def information_gain(previous_classes, current_classes ):
    # TODO: Implement information gain
    def expected_entropy(previous_classes, current_classes):
        prevP = previous_classes.count(1)
        prevN = previous_classes.count(0)
        prevT = len(previous_classes)
        currP = current_classes.count(1)
        currN = current_classes.count(0)
        currT = len(current_classes)
        notGiven_classes = [0]*(prevN - currN)
        notGiven_classes = notGiven_classes.__add__([1]*(prevP - currP))
        ngP = notGiven_classes.count(1)
        ngN = notGiven_classes.count(0)
        ngT = len(notGiven_classes)
        return ( ((currT/float(prevT))*(entropy(current_classes))) + ((ngT/float(prevT))*(entropy(notGiven_classes))) )
    return (entropy(previous_classes) - expected_entropy(previous_classes,current_classes))


class DecisionTree():

    def __init__(self, depth_limit=float('inf')):
        self.root = None
        self.depth_limit = depth_limit

    def fit(self, features, classes):
        self.root = self.__build_tree__(features, classes)

    def __build_tree__(self, features, classes, depth=0):
        import copy
        import numpy as np
        #TODO Implement the algorithm as specified above
        #print "depth is: " + str(depth)
        #print "features: " + str(features)
        #print "classes: " + str(classes)

        #print "features size: " + str(len(features)) + "x" + str(len(features[0]))
        #print "classes size: " + str(len(classes))

        TDNode = DecisionNode(None, None, None, 1)
        FDNode = DecisionNode(None, None, None, 0)
        """
        1) Check for base cases:
            a)If all elements of a list are of the same class, return a leaf node with the appropriate class label.
            b)If a specified depth limit is reached, return a leaf labeled with the most frequent class.
            My addition to algo: If there are no attributes left, return most frequent class
        """
        if classes == [0]*len(classes):
            return FDNode
        elif classes == [1]*len(classes):
            return TDNode

        #depth percent of attributes
        if len(features[0]) == 0 or (self.depth_limit != float('inf') and depth == self.depth_limit):
            if classes.count(1) > classes.count(0):
                return TDNode
            else:
                return FDNode

        """
        2) For each attribute alpha: evaluate the normalized information gain gained by splitting on alpha
        """
        bestAttributeIndex = 0
        bestInfoGain = 0.0
        #go through each attribute
        for x in range(0, len(features[0])):
            #attribute column
            a = [row[x] for row in features]
            #split on true
            #all final true false values where attrubute a was true
            curr_classes = []
            for y in range(0, len(a)):
                if a[y] > 0: #== 1:
                    curr_classes.append(classes[y])

            infoGain = information_gain(classes, curr_classes)
            if infoGain > bestInfoGain:
                bestAttributeIndex = x
                bestInfoGain = infoGain

        """
        3) Let alpha_best be the attribute with the highest normalized information gain
        """
        alpha_best = [row[bestAttributeIndex] for row in features]

        """
        4) Create a decision node that splits on alpha_best
        """
        #remove best attribute column
        new_features = copy.deepcopy(features)
        for x in range(0, len(new_features)):
            new_features[x] = np.delete(new_features[x], bestAttributeIndex)

        l_features = []
        l_classes = []
        r_features = []
        r_classes = []
        #split cases that will go left and ritht, left will cases where alpha_best will be true
        for x in range(0, len(alpha_best)):
            if alpha_best[x] > 0: #== 1:
                l_features.append(new_features[x])
                l_classes.append(classes[x]) #.add
            else:
                r_features.append(new_features[x])
                r_classes.append(classes[x]) #.add

        def RootDFunction(feature):
            if feature[bestAttributeIndex] > 0:
                return 1
            else:
                return 0

        #print "about to build a tree with l features: " + str(l_features)
        #print "about to build a tree with l class: " + str(l_classes)
        #print "about to build a tree with r reatures: " + str(r_features)
        #print "about to build a tree with r class: " + str(r_classes)
        #dn left, right, decisionfunction, class
        root_node = DecisionNode(self.__build_tree__(l_features, l_classes, depth+1), self.__build_tree__(r_features, r_classes, depth+1), RootDFunction)

        """
        5) Recur on the sublists obtained by splitting on alpha_best, and add those nodes as children of node
        """
        # already doing that above

        return root_node

    def classify(self, features):
        #TODO Use a fitted tree to classify a list of feature vectors
        # Your output should be a list of class labels (either 0 or 1)
        return [self.root.decide(feature) for feature in features]


Part 2b: Validation
--------

For this part of the assignment we're going to use a relatively simple dataset (banknote authentication, found in 'part_2_data.csv'. In the section below there are methods to load the data in a consistent format.

In general, reserving part of your data as a test set can lead to unpredictable performance- a serendipitous choice of your train or test split could give you a very inaccurate idea of how your classifier performs. That's where k-fold cross validation comes in.

In the below method, we'll split the dataset at random into k equal subsections, then iterating on each of our k samples, we'll reserve that sample for testing and use the other k-1 for training. Averaging the results of each fold should give us a more consistent idea of how the classifier is doing.


In [16]:
def load_csv(data_file_path, class_index=-1):
    import numpy as np

    handle = open(data_file_path, 'r')
    contents = handle.read()
    handle.close()
    rows = contents.split('\n')
    out = np.array([  [float(i) for i in r.split(',')] for r in rows if r ])
    classes= list(map(int,  out[:, class_index]))
    features = out[:, :class_index]
    return features, classes

def generate_k_folds(dataset, k):
    import random
    #TODO this method should return a list of folds,
    # where each fold is a tuple like (training_set, test_set)
    # where each set is a tuple like (examples, classes)
    examples, classes = dataset

    folds = []

    splitSize = len(classes)/k
    splitValue = splitSize * (k-1)
    for x in range(0, k):
        indexes = list(xrange(len(classes)))
        random.shuffle(indexes)

        training_examples = []
        training_classes = []
        test_examples = []
        test_classes = []

        for y in range(0, splitValue):
            indexToTake = indexes[y]
            training_examples.append(examples[indexToTake])
            training_classes.append(classes[indexToTake])
        for z in range(splitValue, len(classes)):
            indexToTake = indexes[z]
            test_examples.append(examples[indexToTake])
            test_classes.append(classes[indexToTake])

        training_set = (training_examples, training_classes)
        test_set = (test_examples, test_classes)

        f = (training_set, test_set)
        folds.append(f)
    return folds


dataset = load_csv('part2_data.csv')
ten_folds = generate_k_folds(dataset, 10)

#on average your accuracy should be higher than 60%.
accuracies = []
precisions = []
recalls = []
confusion = []

for fold in ten_folds:
    train, test = fold
    train_features, train_classes = train
    test_features, test_classes = test
    tree = DecisionTree( )
    tree.fit( train_features, train_classes)
    output = tree.classify(test_features)

    accuracies.append( accuracy(output, test_classes))
    precisions.append( precision(output, test_classes))
    recalls.append( recall(output, test_classes))
    confusion.append( confusion_matrix(output, test_classes))

print accuracies
print precisions
print recalls
print confusion


[0.8273381294964028, 0.7913669064748201, 0.8345323741007195, 0.8057553956834532, 0.8273381294964028, 0.8345323741007195, 0.8561151079136691, 0.8273381294964028, 0.8489208633093526, 0.8201438848920863]
[0.8103448275862069, 0.7692307692307693, 0.8518518518518519, 0.7301587301587301, 0.8333333333333334, 0.7704918032786885, 0.75, 0.8253968253968254, 0.7936507936507936, 0.8028169014084507]
[0.7833333333333333, 0.78125, 0.7540983606557377, 0.8214285714285714, 0.8088235294117647, 0.8392857142857143, 0.9230769230769231, 0.8, 0.8620689655172413, 0.8382352941176471]
[[[47, 13], [11, 68]], [[50, 14], [15, 60]], [[46, 15], [8, 70]], [[46, 10], [17, 66]], [[55, 13], [11, 60]], [[47, 9], [14, 69]], [[48, 4], [16, 71]], [[52, 13], [11, 63]], [[50, 8], [13, 68]], [[57, 11], [14, 57]]]


Part 3: Random Forests
-------
30 pts.

The decision boundaries drawn by decision trees are very sharp, and fitting a decision tree of unbounded depth to a list of examples almost inevitably leads to overfitting. In an attempt to decrease the variance of our classifier we're going to use a technique called 'Bootstrap Aggregating' (often abbreviated 'bagging').

A Random Forest is a collection of decision trees, built as follows:

1) For every tree we're going to build:

    a) Subsample the examples provided us (with replacement) in accordance with a provided example subsampling rate.
    
    b) From the sample in a), choose attributes at random to learn on (in accordance with a provided attribute subsampling rate)
    
    c) Fit a decision tree to the subsample of data we've chosen (to a certain depth)
    
Classification for a random forest is then done by taking a majority vote of the classifications yielded by each tree in the forest after it classifies an example.



In [17]:
class RandomForest():

    def __init__(self, num_trees, depth_limit, example_subsample_rate, attr_subsample_rate):
        self.trees = []
        self.num_trees = num_trees
        self.depth_limit = depth_limit
        self.example_subsample_rate = example_subsample_rate
        self.attr_subsample_rate = attr_subsample_rate

    def fit(self, features, classes):
        import numpy as np
        # TODO implement the above algorithm to build a random forest of decision trees
        self.trees = []

        #index from 0 to length of classes
        indexes_features = list(xrange(len(classes)))
        indexes_attributes = list(xrange(len(features[0])))

        #create B number of trees
        for b in range(0, self.num_trees):
            """
            a) Subsample the examples provided us (with replacement) in accordance with a provided example subsampling rate.
            """
            #choose from features, number of samples, and with replacement
            numExamplesToSample = int(len(features)*self.example_subsample_rate)
            indexesToSampleFeatures = np.random.choice(indexes_features, numExamplesToSample, True)
            sampleFeatures = []
            sampleClasses = []
            for x in range(0, len(indexesToSampleFeatures)):
                sampleFeatures.append(features[indexesToSampleFeatures[x]])
                sampleClasses.append(classes[indexesToSampleFeatures[x]])

            """
            b) From the sample in a), choose attributes at random to learn on (in accordance with a provided attribute subsampling rate)
            """
            numAttrToSample = int(len(features[0])*self.attr_subsample_rate)
            indexesToSampleAttributes = np.random.choice(indexes_attributes, numAttrToSample, False)
            notUsedAttributes = list(set(indexes_attributes) - set(indexesToSampleAttributes))
            #delete those attributes from sampleFeatures
            for x in range(0, len(sampleFeatures)):
                sampleFeatures[x] = np.delete(sampleFeatures[x], notUsedAttributes)

            """
            c) Fit a decision tree to the subsample of data we've chosen (to a certain depth)
            """
            t = DecisionTree(self.depth_limit)
            t.fit( sampleFeatures, sampleClasses)

            #print str(b+1) + " trees done"
            self.trees.append(t)

    def classify(self, features):
        # TODO implement classification for a random forest.
        classifications = []
        for feature in features:
            featureClassification = []
            for tree in self.trees:
                featureClassification.append(tree.root.decide(feature))
            if featureClassification.count(1) > featureClassification.count(0):
                classifications.append(1)
            else:
                classifications.append(0)

        return classifications

#TODO: As with the DecisionTree, evaluate the performance of your RandomForest on the dataset for part 2.
# on average your accuracy should be higher than 75%.

#  Optimize the parameters of your random forest for accuracy for a forest of 5 trees.
# (We'll verify these by training one of your RandomForest instances using these parameters
#  and checking the resulting accuracy)

#  Fill out the function below to reflect your answer:

def ideal_parameters():
    ideal_depth_limit = 7 #% of number of attributs, instead of hardcoded depth
    ideal_esr = 0.60 #% of samples
    ideal_asr = 0.60 #% of attributes
    return ideal_depth_limit, ideal_esr, ideal_asr


#num_trees, depth_limit, example_subsample_rate, attr_subsample_rate
dataset = load_csv('part2_data.csv')
ten_folds = generate_k_folds(dataset, 10)

#on average your accuracy should be higher than 60%.
accuracies = []
precisions = []
recalls = []
confusion = []

for fold in ten_folds:
    train, test = fold
    train_features, train_classes = train
    test_features, test_classes = test
    depth_limit, example_sr, attr_sr = ideal_parameters()
    forest = RandomForest(5, depth_limit, example_sr, attr_sr)
    forest.fit( train_features, train_classes)
    output = forest.classify(test_features)

    accuracies.append( accuracy(output, test_classes))
    precisions.append( precision(output, test_classes))
    recalls.append( recall(output, test_classes))
    confusion.append( confusion_matrix(output, test_classes))

print accuracies
print precisions
print recalls
print confusion


[0.8848920863309353, 0.8776978417266187, 0.841726618705036, 0.7985611510791367, 0.8848920863309353, 0.8705035971223022, 0.8776978417266187, 0.8489208633093526, 0.8201438848920863, 0.8848920863309353]
[0.8805970149253731, 0.7796610169491526, 0.7454545454545455, 0.8148148148148148, 0.8852459016393442, 0.8571428571428571, 0.8461538461538461, 0.8857142857142857, 0.7205882352941176, 0.84]
[0.8805970149253731, 0.92, 0.8367346938775511, 0.7096774193548387, 0.8571428571428571, 0.8823529411764706, 0.8870967741935484, 0.8266666666666667, 0.8909090909090909, 0.84]
[[[59, 8], [8, 64]], [[46, 4], [13, 76]], [[41, 8], [14, 76]], [[44, 18], [10, 67]], [[54, 9], [7, 69]], [[60, 8], [10, 61]], [[55, 7], [10, 67]], [[62, 13], [8, 56]], [[49, 6], [19, 65]], [[42, 8], [8, 81]]]


Part 4: Challenge!
-------
10 pts

You've been provided with a sample of data from a research dataset in 'challenge_data.pickle'. It is serialized as a tuple of (features, classes). I have reserved a part of the dataset for testing. The classifier that performs most accurately on the holdout set wins (so optimize for accuracy). To get full points for this part of the assignment, you'll need to get at least an average accuracy of 80% on the data you have, and at least an average accuracy of 60% on the holdout set.

Ties will be broken by submission time.

First place:  +3% on your final grade

Second place: +2% on your final grade

Third place:  +1% on your final grade


In [18]:
class ChallengeClassifier():
    def __init__(self, num_trees=5, depth_limit=0.2, example_subsample_rate=1.0, attr_subsample_rate=1.0):
        # initialize whatever parameters you may need here-
        # this method will be called without parameters
        # so if you add any to make parameter sweeps easier, provide defaults
        self.trees = []
        self.num_trees = num_trees
        self.depth_limit = depth_limit
        self.example_subsample_rate = example_subsample_rate
        self.attr_subsample_rate = attr_subsample_rate

    def fit(self, features, classes):
        import numpy as np
        # TODO implement the above algorithm to build a random forest of decision trees
        self.trees = []

        #index from 0 to length of classes
        indexes_features = list(xrange(len(classes)))
        indexes_attributes = list(xrange(len(features[0])))

        #create B number of trees
        for b in range(0, self.num_trees):
            """
            a) Subsample the examples provided us (with replacement) in accordance with a provided example subsampling rate.
            """
            #choose from features, number of samples, and with replacement
            numExamplesToSample = int(len(features)*self.example_subsample_rate)
            indexesToSampleFeatures = np.random.choice(indexes_features, numExamplesToSample, True)
            sampleFeatures = []
            sampleClasses = []
            for x in range(0, len(indexesToSampleFeatures)):
                sampleFeatures.append(features[indexesToSampleFeatures[x]])
                sampleClasses.append(classes[indexesToSampleFeatures[x]])

            """
            b) From the sample in a), choose attributes at random to learn on (in accordance with a provided attribute subsampling rate)
            """
            numAttrToSample = int(len(features[0])*self.attr_subsample_rate)
            indexesToSampleAttributes = np.random.choice(indexes_attributes, numAttrToSample, False)
            notUsedAttributes = list(set(indexes_attributes) - set(indexesToSampleAttributes))
            #delete those attributes from sampleFeatures
            for x in range(0, len(sampleFeatures)):
                sampleFeatures[x] = np.delete(sampleFeatures[x], notUsedAttributes)

            """
            c) Fit a decision tree to the subsample of data we've chosen (to a certain depth)
            """
            t = FancyDecisionTree(self.depth_limit)#DecisionTree(7)#FancyDecisionTree(self.depth_limit) #Fancy
            t.fit( sampleFeatures, sampleClasses)

            #print str(b+1) + " trees done"
            self.trees.append(t)

    def classify(self, features):
        # classify each feature in features as either 0 or 1.
        classifications = []
        for feature in features:
            featureClassification = []
            for tree in self.trees:
                featureClassification.append(tree.root.decide(feature))
            if featureClassification.count(1) > featureClassification.count(0):
                classifications.append(1)
            else:
                classifications.append(0)
        return classifications

class FancyDecisionTree():
    def __init__(self, depth_limit=float('inf')):
        self.root = None
        self.depth_limit = depth_limit

    def fit(self, features, classes):
        self.root = self.__build_tree__(features, classes)

    def __build_tree__(self, features, classes, depth=0):
        import copy
        import numpy as np
        #print "depth is: " + str(depth)
        #print "features: " + str(features)
        #print "classes: " + str(classes)

        #print "features size: " + str(len(features)) + "x" + str(len(features[0]))
        #print "classes size: " + str(len(classes))

        TDNode = DecisionNode(None, None, None, 1)
        FDNode = DecisionNode(None, None, None, 0)
        """
        1) Check for base cases:
            a)If all elements of a list are of the same class, return a leaf node with the appropriate class label.
            b)If a specified depth limit is reached, return a leaf labeled with the most frequent class.
            My addition to algo: If there are no attributes left, return most frequent class
        """
        if classes == [0]*len(classes):
            return FDNode
        elif classes == [1]*len(classes):
            return TDNode

        #depth percent of attributes
        if len(features[0]) == 0 or (self.depth_limit != float('inf') and depth >= int(len(features[0])*self.depth_limit) and (classes.count(1) > 3*classes.count(0) or classes.count(0) > 3*classes.count(1))):
            if classes.count(1) > classes.count(0):
                return TDNode
            else:
                return FDNode

        """
        2) For each attribute alpha: evaluate the normalized information gain gained by splitting on alpha
        """
        bestAttributeIndex = 0
        bestAttributeIndexSplitValue = 0
        bestInfoGain = 0.0
        #go through each attribute
        for x in range(0, len(features[0])):
            #attribute column
            a = [row[x] for row in features]

            uniqueAValues = list(set(a))
            #testing using just middle value
            #uniqueAValues = [uniqueAValues[len(uniqueAValues)/2]]
            #testing getting randomly some values
            uniqueAValues = np.random.choice(uniqueAValues, int(len(uniqueAValues)*0.15) + 1, False)
            for u in uniqueAValues:
                #split on value
                #all final true false values where attrubute a was <> each unique value
                curr_classes = []
                for y in range(0, len(a)):
                    if a[y] > u:
                        curr_classes.append(classes[y])

                infoGain = information_gain(classes, curr_classes)
                if infoGain > bestInfoGain:
                    bestAttributeIndex = x
                    bestAttributeIndexSplitValue = u
                    bestInfoGain = infoGain

        """
        3) Let alpha_best be the attribute with the highest normalized information gain
        """
        alpha_best = [row[bestAttributeIndex] for row in features]

        """
        4) Create a decision node that splits on alpha_best
        """
        #remove best attribute column
        new_features = copy.deepcopy(features)
        #testing without removing attribute column... in submission I removed column
        """
        for x in range(0, len(new_features)):
            new_features[x] = np.delete(new_features[x], bestAttributeIndex)
        """

        l_features = []
        l_classes = []
        r_features = []
        r_classes = []
        #split cases that will go left and ritht, left will cases where alpha_best will be > than best split value
        for x in range(0, len(alpha_best)):
            if alpha_best[x] > bestAttributeIndexSplitValue:
                l_features.append(new_features[x])
                l_classes.append(classes[x])
            else:
                r_features.append(new_features[x])
                r_classes.append(classes[x])
        #print "basisv: " + str(bestAttributeIndexSplitValue)
        def RootDFunction(feature):
            if feature[bestAttributeIndex] > bestAttributeIndexSplitValue:
                return 1
            else:
                return 0

        #print "about to build a tree with l features: " + str(l_features)
        #print "about to build a tree with l class: " + str(l_classes)
        #print "about to build a tree with r reatures: " + str(r_features)
        #print "about to build a tree with r class: " + str(r_classes)
        #dn left, right, decisionfunction, class
        root_node = DecisionNode(self.__build_tree__(l_features, l_classes, depth+1), self.__build_tree__(r_features, r_classes, depth+1), RootDFunction)

        """
        5) Recur on the sublists obtained by splitting on alpha_best, and add those nodes as children of node
        """
        # already doing that above

        return root_node

    def classify(self, features):
        # Your output should be a list of class labels (either 0 or 1)
        return [self.root.decide(feature) for feature in features]

#Validate
import pickle
dataset = pickle.load(open('challenge_data.pickle', 'rb'))
ten_folds = generate_k_folds(dataset, 10)

#on average your accuracy should be higher than 60%.
accuracies = []
precisions = []
recalls = []
confusion = []

for fold in ten_folds:
    train, test = fold
    train_features, train_classes = train
    test_features, test_classes = test
    forest = ChallengeClassifier()
    forest.fit( train_features, train_classes)
    output = forest.classify(test_features)
    accuracies.append( accuracy(output, test_classes))
    precisions.append( precision(output, test_classes))
    recalls.append( recall(output, test_classes))
    confusion.append( confusion_matrix(output, test_classes))

print accuracies
print precisions
print recalls
print confusion

import numpy as np
print "Mean accuracy: " + str(np.mean(accuracies))

[0.9264069264069265, 0.9307359307359307, 0.9090909090909091, 0.9393939393939394, 0.9393939393939394, 0.9264069264069265, 0.9307359307359307, 0.9264069264069265, 0.9567099567099567, 0.935064935064935]
[0.9382716049382716, 0.9540229885057471, 0.9222222222222223, 0.9382716049382716, 0.9042553191489362, 0.9326923076923077, 0.8936170212765957, 0.9101123595505618, 0.9523809523809523, 0.9230769230769231]
[0.8636363636363636, 0.8736842105263158, 0.8556701030927835, 0.8941176470588236, 0.9444444444444444, 0.9065420560747663, 0.9333333333333333, 0.9, 0.9302325581395349, 0.9130434782608695]
[[[76, 12], [5, 138]], [[83, 12], [4, 132]], [[83, 14], [7, 127]], [[76, 9], [5, 141]], [[85, 5], [9, 132]], [[97, 10], [7, 117]], [[84, 6], [10, 131]], [[81, 9], [8, 133]], [[80, 6], [4, 141]], [[84, 8], [7, 132]]]
Mean accuracy: 0.932034632035
