# Data Science with Python : ID3 Classification Algorithm #1822

The basic simple algorithm used in building decision trees is known as the ID3 classification algorithm. It builds decision trees using a top-down, greedy approach. Decision tree has decision nodes and leaf nodes. A decision node has two or more brunches and leaf node represents a decision. The leaf nodes can not be divided more.

### ID3 Algorithm :
This algorithm assumes that all the attributes are discrete and the classification is binary.
###### Step 1. 
Calculate the entropy or amount of uncertainity in the data set.

$$Entropy(S) = \frac{-p}{p+n}log_{2}(\frac{p}{p+n}) - \frac{n}{p+n}log_{2}(\frac{n}{p+n})$$
###### Step 2. 
Then find the Average Information Entropy:

$$I(Attribute) = \sum\frac{p_{i}+n_{i}}{p+n}Entropy(A)$$
###### Step 3. 
Find the information gain for each attribute. It is the difference between entropy before splitting and average entropy after splitting.

$$Gain = Entropy(S) - I(Attribute)$$

###### Step 4. 
Select the Highest Information Gain Attribute.

###### Step 5. 
Now just repeat until the desired tree is formed.

## Building a Decision Tree :

Now we will see the step by step implementation of ID3 algorithm. we will use a simple dataset to build a decision tree.

In [1]:
#We will use pandas for manipulating the dataset.
import pandas as pd

#We will read the dataset (csv file) and load it into pandas dataframe.
my_dataframe = pd.read_csv('Downloads/PlayTennis.csv')
print("\n Our Data Set is:\n", my_dataframe)


 Our Data Set is:
      Outlook Temperature Humidity Play Tennis
0      Sunny         Hot     High          No
1   Overcast         Hot     High         Yes
2       Rain        Mild     High         Yes
3       Rain        Cool   Normal         Yes
4   Overcast        Cool   Normal         Yes
5      Sunny        Mild     High          No
6      Sunny        Cool   Normal         Yes
7       Rain        Mild   Normal         Yes
8      Sunny        Mild   Normal         Yes
9   Overcast        Mild     High         Yes
10  Overcast         Hot   Normal         Yes


In [2]:
# Now we will fetch the attribute names from input dataset
target = my_dataframe.keys()[-1]
print('Our Target Attribute is ---> ', target)
attribute_names = list(my_dataframe.keys())

#Now we will remove our target attribute from the attribute names list
attribute_names.remove(target) 
print('Our Predicting Attributes ---> ', attribute_names)

Our Target Attribute is --->  Play Tennis
Our Predicting Attributes --->  ['Outlook', 'Temperature', 'Humidity']


In [3]:
# To build a decision tree we must compare the impurity of each attribute
# So we have to find out what is the entropy of our collection
# For this we can define a function to calculate the entropy using -x*log2*x

import math  
def entropy(probabilities):  
    return sum( [-probability*math.log(probability, 2) for probability in probabilities])

# We have to calculate the entropy of the dataset w.r.t target attribute
# We will do the calculation with the help of a function

def entropy_of_list(lst,value):  
    from collections import Counter
    
    #Initialize Total instances associated with respective attribute
    total_instances = len(lst)  
    print(".~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.")
    print("\nTotal no of instances associated with '{0}' is ---> {1}".format(value,total_instances))
    # Now we will calculate the proportion of class using count variable
    count = Counter(x for x in lst)
    print('\nTarget attribute class count(Yes/No)=',dict(count))
    
    # Here, x means number of YES/NO
    probabilities = [x / total_instances for x in count.values()]  
    print("\nClasses --->", max(count), min(count))
    print("\nProbabilities of Class 'p'='{0}' ---> {1}".format(max(count),max(probabilities)))
    print("Probabilities of Class 'n'='{0}' ---> {1}".format(min(count),min(probabilities)))
    
    return entropy(probabilities)

In [4]:
# Now we will calculate Information Gain using a funtion
# The function will find out difference in Entropy before and after splitting the dataset

def information_gain(my_dataframe, split_attribute, target, battr):
    print("\n\n.~.~.~.~. Information Gain Calculation of",split_attribute,".~.~.~.~. ") 
    
    # group the data based on attribute values
    my_dataframe_split = my_dataframe.groupby(split_attribute) 
    glist=[]
    for gname,group in my_dataframe_split:
        print('Our Grouped Attribute Values \n',group)
        print(".~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.")
        glist.append(gname) 
    
    #Let's calculate the entropy and proportion 
    glist.reverse()
    nobs = len(my_dataframe.index) * 1.0   
    my_dataframe_agg1=my_dataframe_split.agg({target :lambda x:entropy_of_list(x, glist.pop())})
    my_dataframe_agg2=my_dataframe_split.agg({target :lambda x:len(x)/nobs})
    
    my_dataframe_agg1.columns=['Entropy']
    my_dataframe_agg2.columns=['Proportion']
    
    # Calculate Information Gain:
    # We just have to find difference in new and old entropy
    new_entropy = sum( my_dataframe_agg1['Entropy'] * my_dataframe_agg2['Proportion'])
    if battr !='S':
        old_entropy = entropy_of_list(my_dataframe[target],'S-'+my_dataframe.iloc[0][my_dataframe.columns.get_loc(battr)])
    else:
        old_entropy = entropy_of_list(my_dataframe[target],battr)
    return old_entropy - new_entropy

In [5]:
# Here goes our ID3 Algorithm
# Using this function first we will check if the split we are going to do is best or not
# Then we must compare the information gain to pick up the best and after that will start splitting
# We will Initiate the tree choosing the best attribute to be a node
# Will split dataset-On each split and recursively call this algorithm.
# We have to complete the empty tree with sub-trees, by  calling the functions recursively

def id3(my_dataframe, target, attribute_names, default_class=None,default_attr='S'):
    
    from collections import Counter
    count = Counter(x for x in my_dataframe[target])# x is the number of YES/NO
    
    # First we have to check that this split of the dataset is homogeneous or not
    if len(count) == 1:
        return next(iter(count))  # next input data set, or raises StopIteration when EOF is hit.
    
    # Then we must chceck if this split of the dataset is empty or not. 
    # If yes, we will return a default value
    elif my_dataframe.empty or (not attribute_names):
        return default_class  # We are Returning None for Empty Data Set
    
    # Finally This dataset is ready
    else:
        # Set a Default Value for next recursive call of this function:
        default_class = max(count.keys()) #No of YES and NO Class
        # Computing the Information Gain of the attributes:
        gains=[]
        for attr in attribute_names:
            ig= information_gain(my_dataframe, attr, target, default_attr)
            gains.append(ig)
            print('\nInformation gain of','“',attr,'”','is ---> ', ig)
            print("~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~")
            print("~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~")
         
        index_of_max = gains.index(max(gains))               # Index of Best Attribute
        best_attr = attribute_names[index_of_max]            # Choose Best Attribute to split on
        print("\nList of Gain for arrtibutes:", attribute_names,"\nare:", gains,"respectively.")
        print("\nAttribute with the maximum gain is ---> ", best_attr)
        print("\nHence, the Root node will be ---> ", best_attr)
        print("~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~")
        print("~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~")

        # We will Create an empty tree first
        # Gradually it will become populated 
        tree = {best_attr:{}} # Initiating the tree with the best attribute as a node 
        remaining_attribute_names =[i for i in attribute_names if i != best_attr]
        
        # Complete the tree
        for attr_val, data_subset in my_dataframe.groupby(best_attr):
            subtree = id3(data_subset,target, remaining_attribute_names,default_class,best_attr)
            tree[best_attr][attr_val] = subtree
        return tree

In [6]:
#Now we will form our tree
#Let's print our tree calling id3() funcion

from pprint import pprint
tree = id3(my_dataframe,target,attribute_names)
print("\nThe Resultant Decision Tree is:")
pprint(tree)



.~.~.~.~. Information Gain Calculation of Outlook .~.~.~.~. 
Our Grouped Attribute Values 
      Outlook Temperature Humidity Play Tennis
1   Overcast         Hot     High         Yes
4   Overcast        Cool   Normal         Yes
9   Overcast        Mild     High         Yes
10  Overcast         Hot   Normal         Yes
.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.
Our Grouped Attribute Values 
   Outlook Temperature Humidity Play Tennis
2    Rain        Mild     High         Yes
3    Rain        Cool   Normal         Yes
7    Rain        Mild   Normal         Yes
.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.
Our Grouped Attribute Values 
   Outlook Temperature Humidity Play Tennis
0   Sunny         Hot     High          No
5   Sunny        Mild     High          No
6   Sunny        Cool   Normal         Yes
8   Sunny        Mild   Normal         Yes
.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.
.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.

We have learnt how Id3 algorithm works by selecting the attribute with highest information gain (or we can say least entropy) iteratively. It can build small trees but we can not ensure that it is the smallest tree. As ID3 follows greedy method ,there are also high chances of getting stuck into local optimums. In spite of all these shortcomings ,ID3 algorithm is very much useful in bulding decision tree of a given dataset.