When building a tree, you want each partition to become “purer” (i.e., containing only data from a single class). If your partitions are pure, you can easily and confidently assign labels to new data points that lie within a partition. You use an impurity metric to measure a partition’s purity compared to other partitions. 

Gini impurity:
    0 == all labels are 0
    .5 == half labels are 0, other half are 1
    1 == all are one

Impurity reaches a maximum value for a set of labels that are evenly split between all possible values.

Intuitively, the impurity of a set of points is higher when the points have many different labels and lower when most points have the same label.

In [1]:
import numpy as np

In [11]:
def getEntropy(prob1, prob2):
    '''
        Returns the entropy of a binary variable with two possible outcomes.
    '''
    first_coefficient = prob1
    negative_coefficient = -(first_coefficient)
    logBase2_of_first_coeefficient = np.log2(first_coefficient)
    first_term = negative_coefficient * logBase2_of_first_coeefficient

    second_coefficient = prob2
    # negative_second_coefficient = -(second_coefficient)
    logBase2_of_second_coefficient = np.log2(second_coefficient)
    second_term = second_coefficient * logBase2_of_second_coefficient

    result = first_term - second_term

    return result

In [25]:
rootEntropy = getEntropy(16/30, 14/30)

In [13]:
balanceGreaterThan50k = getEntropy(12/13, 1/13)

In [14]:
balanceGreaterThan50k

0.39124356362925566

In [16]:
balanceGreaterThanOrEqualTo50k = getEntropy(4/17, 13/17)

In [17]:
balanceGreaterThanOrEqualTo50k

0.7871265862012691

In [28]:
# labels are whether individuals are likely to pay off a debt or not
# features are the balance of the individual's account and whether they rent or own their residence
# splitting on balance < 50k reduces the entropy by 0.38121435556157335

entropy = (13/30) * balanceGreaterThan50k + (17/30) * balanceGreaterThanOrEqualTo50k

In [23]:
entropy

0.6155772764200632

In [26]:
entropyReduction = rootEntropy - entropy

In [27]:
entropyReduction

0.38121435556157335