# Entropy
The concept of entropy comes from physics, where it gives a measure of the number of configurations that a thermodynamic system can take. A large entropy indicates more possible configurations and, hence, more uncertainty in the system.

In machine learning, entropy gauges the level of impurity or uncertainty in data. Models such as decision trees use entropy to find features that, when used to perform splits, produce lowest possible entropies in the resultant groups. The reduction in entropy is information gain.

Say, a decision tree is being trained to predict Green and Liberal voters in a dataset. When performing the first split, what is the label that can partition the dataset into two groups - that of Green and Liberal voters - with the highest accuracy? It is the label that produces the least amount of entropy in the two groups.

For the sake of argument, let's say that the voter dataset consists of 50 females and 50 males, and that all the females vote Green and all the males vote Liberal. In this scenario, if the decision tree uses sex to split the dataset into Green and Liberal voters, then both the resultant groups would have 0 entropy.

The formula for entropy is as follows:

![entropy](images/entropy_formula.png)
 
<i>n</i> is the total number of labels and <i>p</i> is the probability of the <i>ith</i> label appearing in the data.

Let's build a function to calculate entropy.

In [1]:
import pandas as pd
import numpy as np

In [2]:
def entropy(labels):
    """
    Return the entropy of labels
    """
    if len(labels)==0:
        return 0
    s = pd.Series(labels)   
    s_probs = s.value_counts(normalize=True)
    return - np.sum(s_probs * np.log(s_probs))

In the hypothetical voter dataset above, it was claimed that if the data was split by sex, it would result in groups with zero entropies as all the females vote Green and males vote Liberal. Let's test this claim by using the above entropy function.

In [3]:
green_voters = ['Green' for _ in range(50)]
liberal_voters = ['Liberal' for _ in range(50)]

Check the values in the voter groups.

In [4]:
green_voters[:5]

['Green', 'Green', 'Green', 'Green', 'Green']

In [5]:
liberal_voters[-5:]

['Liberal', 'Liberal', 'Liberal', 'Liberal', 'Liberal']

Calculate the entropies.

In [6]:
print("The entropy in green_voters and liberal_voters are {} and {}, respectively"\
          .format(entropy(green_voters), entropy(liberal_voters)))

The entropy in green_voters and liberal_voters are -0.0 and -0.0, respectively


Let's check the entropy in the dataset before it was split by sex. 

In [7]:
entropy(green_voters + liberal_voters)

0.6931471805599453

As can be seen, entropy was much higher before the data was split by sex.

What is the entropy if all the 100 voters voted for different political parties numbered from 0 to 99? Let's find out.

In [8]:
entropy(range(100))

4.605170185988082

Unlike a voter from the Green or Liberal cohort in the hypothetical voter dataset above, in this dataset of 100 voters, there is little certainty in the voting intention of a randomly selected voter. This is reflected in the much higher level of entropy.