# Extra Material: Entropy

In this notebook, we will look at entropy values for a data set of `N` points. First, we assign the `N` points to 2 classes. Then, we assign the points to 3 classes. In each case, we vary the number of data points in each class and compare entropy values. In addition to the class sizes (`n_c`), we also compute the probability that a data point belong to the given class (`p_c`).

Look at the results and answer the following questions:
- What distribution leads to small entropy values?
- What distribution leads to large entropy values?
- Which distributions have the same entropy? Why?
- What are differences in entropy values between the 2 and 3 class examples?

In [1]:
import pandas as pd
import numpy as np

In [2]:
N = 90  # number of data points

In [3]:
def compute_entropy(y):
    # number of samples
    n = len(y)
    # unique classes in `y_class`
    classes = set(y)

    # compute number of observations in each class
    n_c = np.zeros(len(classes))
    # compute probability for a sample to be in class
    p_c = np.zeros(len(classes))
    # compute entropy for each class
    e_c = np.zeros(len(classes))

    # computation
    for i, c in enumerate(classes):
        n_c[i] = np.sum(y == c)
        p_c[i] = n_c[i] / n
        e_c[i] = -p_c[i] * np.log2(p_c[i])
    assert n == sum(n_c)

    # total entropy
    e_p = np.sum(e_c)

    # return
    return e_p

## 2 classes

Build all possible set permutations:
- `N` points in class 0 and 0 points in class 1
- `N-1` points in class 0 and 1 point in class 1
- ...

In [4]:
list_entropy = []
for n_0 in range(N):
    n_1 = N-n_0
    y_n = np.concatenate([np.zeros(n_0), np.ones(n_1)])
    list_entropy.append(
        pd.DataFrame({
            "n_0": [n_0],
            "p_0": n_0 / N,
            "n_1": n_1,
            "p_1": n_1 / N,
            "e": compute_entropy(y_n),
        })
    )
entropy_n = pd.concat(list_entropy)

In [5]:
entropy_n.sort_values("e", ascending=True).head(5)

Unnamed: 0,n_0,p_0,n_1,p_1,e
0,0,0.0,90,1.0,0.0
0,89,0.988889,1,0.011111,0.088072
0,1,0.011111,89,0.988889,0.088072
0,2,0.022222,88,0.977778,0.153742
0,88,0.977778,2,0.022222,0.153742


In [6]:
entropy_n.sort_values("e", ascending=False).head(5)

Unnamed: 0,n_0,p_0,n_1,p_1,e
0,45,0.5,45,0.5,1.0
0,44,0.488889,46,0.511111,0.999644
0,46,0.511111,44,0.488889,0.999644
0,47,0.522222,43,0.477778,0.998575
0,43,0.477778,47,0.522222,0.998575


## 3 classes

Build all possible set permutations:
- `N` points in class 0, 0 points in class 1 and 0 points in class 2
- `N-1` points in class 0, 1 point in class 1 and 0 points in class 2
- ...

In [7]:
list_entropy = []
for n_0 in range(N+1):
    for n_1 in range(N+1-n_0):
        n_2 = N-n_0-n_1
        y_n = np.concatenate([np.zeros(n_0), np.ones(n_1), np.ones(n_2)+1])
        list_entropy.append(
            pd.DataFrame({
                "n_0": [n_0],
                "p_0": n_0/N,
                "n_1": n_1,
                "p_1": n_1/N,
                "n_2": n_2,
                "p_2": n_2/N,
                "e": compute_entropy(y_n),
            })
        )
entropy_n = pd.concat(list_entropy)

In [8]:
entropy_n.sort_values("e", ascending=False).head(10)

Unnamed: 0,n_0,p_0,n_1,p_1,n_2,p_2,e
0,30,0.333333,30,0.333333,30,0.333333,1.584963
0,31,0.344444,30,0.333333,29,0.322222,1.584428
0,30,0.333333,31,0.344444,29,0.322222,1.584428
0,31,0.344444,29,0.322222,30,0.333333,1.584428
0,29,0.322222,31,0.344444,30,0.333333,1.584428
0,29,0.322222,30,0.333333,31,0.344444,1.584428
0,30,0.333333,29,0.322222,31,0.344444,1.584428
0,29,0.322222,32,0.355556,29,0.322222,1.583376
0,32,0.355556,29,0.322222,29,0.322222,1.583376
0,29,0.322222,29,0.322222,32,0.355556,1.583376


In [9]:
entropy_n.sort_values("e", ascending=True).head(10)

Unnamed: 0,n_0,p_0,n_1,p_1,n_2,p_2,e
0,0,0.0,0,0.0,90,1.0,0.0
0,0,0.0,90,1.0,0,0.0,0.0
0,90,1.0,0,0.0,0,0.0,0.0
0,0,0.0,89,0.988889,1,0.011111,0.088072
0,0,0.0,1,0.011111,89,0.988889,0.088072
0,1,0.011111,0,0.0,89,0.988889,0.088072
0,89,0.988889,0,0.0,1,0.011111,0.088072
0,1,0.011111,89,0.988889,0,0.0,0.088072
0,89,0.988889,1,0.011111,0,0.0,0.088072
0,2,0.022222,88,0.977778,0,0.0,0.153742


Note two learnings related to our discussion in lecture 06:
1. You can scale entropy values wihtout changing how a tree would split observations
1. The maximum possible entropy depends on the number of classes.

<br>
<br>
&mdash; <br>
Sebastian Gabel <br>
`Learning from Big Data`, Module 2, Lecture 06 <br>
2021/22 <br>