We will use one-hot encoding to encode the categorical features. They will be as follows:

Ear Shape: Pointy = 1, Floppy = 0
Face Shape: Round = 1, Not Round = 0
Whiskers: Present = 1, Absent = 0

Therefore, we have two sets:

X_train: for each example, contains 3 features:

    Ear Shape (1 if pointy, 0 otherwise)
    Face Shape (1 if round, 0 otherwise)
    Whiskers (1 if present, 0 otherwise)
            
y_train: whether the animal is a cat

    1 if the animal is a cat
    0 otherwise

In [26]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [7]:
X_train = np.array([
 [1, 1, 1],
 [0, 0, 1],
 [0, 1, 0],
 [1, 0, 1],
 [1, 1, 1],
 [1, 1, 0],
 [0, 0, 0],
 [1, 1, 0],
 [0, 1, 0],
 [0, 1, 0]
 ])

y_train = np.array([1, 1, 0, 0, 1, 1, 0, 1, 0, 0])

In [9]:
def entropy(p):
    if p == 0 or p == 1:
        return 0
    else:
        return -p * np.log2(p) - (1- p)*np.log2(1 - p)

In [14]:
def split(X, index_feature):
    left = []
    right = []
    for i,x in enumerate(X):
        if x[index_feature] == 1:
            left.append(i)
        else: 
            right.append(i)
    return left, right


In [18]:
def weighted_entropy(X,y,left,right):
    
    w_left = len(left)/len(X)
    w_right = len(right)/len(X)
    p_left = sum(y[left])/len(left)
    p_right = sum(y[right])/len(right) 

    weighted_entropy = w_left * entropy(p_left) + w_right * entropy(p_right)
    return weighted_entropy

In [19]:
left, right = split(X_train, 0)
weighted_entropy(X_train, y_train, left, right)

0.7219280948873623

In [20]:
def information_gain(X, y, left_indices, right_indices):

    p_node = sum(y)/len(y)
    h_node = entropy(p_node)
    w_entropy = weighted_entropy(X,y,left_indices,right_indices)
    return h_node - w_entropy

In [21]:
information_gain(X_train, y_train, left, right)

0.2780719051126377

In [22]:
for i, feature_name in enumerate(['Ear Shape', 'Face Shape', 'Whiskers']):
    left_indices, right_indices = split(X_train, i)
    i_gain = information_gain(X_train, y_train, left_indices, right_indices)
    print(f"Feature: {feature_name}, information gain if we split the root node using this feature: {i_gain:.2f}")

Feature: Ear Shape, information gain if we split the root node using this feature: 0.28
Feature: Face Shape, information gain if we split the root node using this feature: 0.03
Feature: Whiskers, information gain if we split the root node using this feature: 0.12
