# Decision Trees and Linear Classifiers

This Jupyter notebook will cover the topics:

* ID3 algorithm
* Perceptron

## Note

This notebook is a bit more code-heavy than the one before. But don't try to let that overwhelm you in case you're not that expecienced. The structure is already given, key parts are explained by comments, and only small bits inbetween need to be added.

In terms of new programming concepts, this notebook makes use of functions and recursion. Below is a minimal explanatory showcase.

In [None]:
# A function
def add_one_and_double(x):
    y = x + 1
    z = y * 2
    return z

print(add_one_and_double(3))

# A recursive function
def factorial(n):
    if n == 1:
        return 1
    else:
        return n * factorial(n-1)  # <-- here the function calls itself
    
print(factorial(5))

## ID3 Algorithm

### Data Preparation

In [None]:
import numpy as np
import pandas as pd
from numpy import log2 as log
eps = np.finfo(float).eps  # a number very close to zero that we'll use further down

We use a data set on student performance in this exercise. If you want to learn more about the data, check the following link: https://archive.ics.uci.edu/ml/datasets/student+performance

We're going to use only 6 attributes to make the result more illustrative. For the same reason we're renaming some values and columns.


In [None]:
grades = pd.read_csv('school_grades_weka_dataset.csv')
print(grades.shape)
grades = grades.iloc[:,[1,2,3,4,20,32]]  # select columns only with given indexes
grades

We need to convert the age attribute from numeric to binary. To find an appropriate split we're using the median and mean functions from the numpy package.

In [None]:
print(np.median(grades.age))
print(np.mean(grades.age))

Seems like 17 is a good split point

In [None]:
grades = grades.rename(columns={"G3":'fin_grade'})

#list comprehension to make the values more illustrative
grades.address = ["urban" if x == 'U' else "rural" for x in grades.address]
grades.famsize = [">3" if x == 'GT3' else "<=3" for x in grades.famsize]
grades.sex = ['female' if x == 'F' else 'male' for x in grades.sex]
grades.age = ['>=17' if x >= 17 else '<17' for x in grades.age ]

In [None]:
grades.fin_grade = ['excellent' if x>=18 else 'very good' if x>=16 else 'good' if x>=14 else 'sufficient' if x>=10 else 'weak' if x>=4 else 'poor' for x in grades.fin_grade ]

The final attributes and their values:

* sex - student's sex (binary: 'female' or 'male')
* age - student's age (binary: '<17' or '>=17')
* address - student's home address type (binary: 'urban' or 'rural')
* famsize - family size (binary: '<=3' or '>3')
* higher - wants to take higher education (binary: 'yes' or 'no')
* fin_grade - final grade (nominal: 'excellent', 'very good', 'good', 'sufficient', 'weak', 'poor')



Our dataset is a bit imbalanced with more than a half of final grades equal to 'sufficient'. We're adjusting the dataset manually to get a more illustrative example of the decision tree.

In [None]:
subset_wo_suff = grades[grades.fin_grade != 'sufficient']  # filter grades different from 'sufficient'
subset_w_suff = grades[grades.fin_grade == 'sufficient'].sample(n = 90).reset_index(drop = True)  # filter 'sufficient' grades and keep only 90 rows
grades = subset_w_suff.append(subset_wo_suff, ignore_index=True)  # merge two dataframes together
grades = grades.sample(frac=1).reset_index(drop=True)  # shuffle the final dataframe
print(grades.shape)
grades.head()

### ID3 Algorithm Implementation



**Task:** Complete the code below to calculate entropy for a given dataframe.

In [None]:
def calculate_entropy(df):
    
    target = df.keys()[-1]  # get the name of target column
    entropy = 0
    target_values = df[target].unique()  # get unique values of the target column {'excellent', 'very good', ...}
    
    for value in target_values:
        # df[column].value_counts()[value] returns number of rows with the specific value in the given column
        relative_frequency_value = df[target].value_counts()[value] / len(df[target]) 
        entropy += # ...
        
    return entropy

**Task:** Complete the code below to calculate entropy for a given dataframe and attribute

*Note:* We're using eps to avoid getting log(0) or 0 in the denominator

In [None]:
def calculate_entropy_attribute(df, attribute):
    
    target = df.keys()[-1]  # get the name of target column
    target_values = df[target].unique()  # get unique values of the target column {'excellent', 'very good', ...}
    
    attribute_values = df[attribute].unique()  # get unique values of the attribute column
    average_entropy = 0
    
    for att_value in attribute_values:
        
        set_entropy = 0
        
        for target_value in target_values:
            
            num_elements_in_class = # ...
            num_elements_in_set = # ...
            relative_frequency = num_elements_in_class/(eps+num_elements_in_set) 
            set_entropyt += -relative_frequency*log(eps+relative_frequency)
        
        partition_weight = # ...
        average_entropy += # ...
        
    return abs(average_entropy)

**Task:** Complete the code below to calculate maximum information gain

In [None]:
def find_max_gain(df):
    
    gains = []
    attributes = df.keys()[:-1]  # get names of all attributes columns
    
    for attribute in attributes:
        
        gain = # ...
        gains.append(gain)
    
    return attributes[np.argmax(gains)], np.max(gains)  # np.argmax(array) returns index of max element

**Task:** Complete the code below to build the decision tree

In [None]:
def decision_tree(df, tree=None):
    
    target = df.keys()[-1]
    node, gain = find_max_gain(df) 
    attributes = np.unique(df[node])
    
    if tree is None:                    
        tree={}
        tree[node] = {}
        
    for value in attributes:
        
        subtable = df[df[node] == value].reset_index(drop=True) 
        subtable.drop(columns = [node], inplace = True)  # delete the current node column
        
        # np.unique() returns sorted unique elements of the array/column and their counts
        clValue, counts = np.unique(subtable[target], return_counts=True)
        
        if len(counts)==1 or len(subtable.columns) == 1:  # Check if the node is the terminal/leaf node
            tree[node][value] = clValue[#...]                                            
        else:        
            tree[node][value] = # ...
                   
    return tree

We use pprint package to print the formatted representation of the tree that we save as a dictionary

In [None]:
import pprint

tree = decision_tree(grades)
    
pprint.pprint(tree)

**Task:** Modify code above to build a decision tree with given maximum depth


In [None]:
def decision_tree_with_max_depth(df, tree=None, max_depth=None):
    
    if max_depth is None:
        #...
    
    if tree is None:                    
        tree={}
        tree[node] = {}
        
    for value in attributes:
        
        subtable = df[df[node] == value].reset_index(drop=True)
        subtable.drop(columns = [node], inplace = True)  # delete the current node column
        
        # np.unique() returns sorted unique elements of the array/column and their counts
        clValue,counts = np.unique(subtable[target], return_counts=True)   
        
        if len(counts)==1 or #...
            tree[node][value] = clValue[#...]                                                    
        else:        
            tree[node][value] = #...
                   
    return tree

In [None]:
tree_max_depth = decision_tree_with_max_depth(grades, max_depth = 3)
    
pprint.pprint(tree_max_depth)

## Perceptron

Perceptron is an algorithm for supervised learning of binary classifiers.

### sklearn.linear_model.Perceptron

At first we're going to look at the perceptron implementation in the sklearn package.

Below code defines our data set and plots it using the matplotlib package.

In [None]:
import matplotlib.pyplot as plt

X1 = np.array([[ 0.33779625,  0.43771315],
       [-3.04215519,  0.44362234],
       [ 1.55633835,  1.50277908],
       [-1.73490571,  1.6579759 ],
       [-1.73168615,  1.49470015],
       [ 0.8667018 ,  0.41225495],
       [-2.87771733,  0.86954988],
       [ 0.75223565,  0.08440232],
       [-1.47945738,  1.2616705 ],
       [-0.4785672 ,  1.19985036],
       [ 1.62548382,  1.1795993 ],
       [-0.84739376,  0.2028666 ],
       [ 2.13363607,  1.14552905],
       [ 0.27267319,  0.38029338],
       [ 2.14503557,  0.22562339],
       [ 1.36963359, -0.13479744],
       [-0.41267018,  1.77212241],
       [-2.9671909 ,  1.72121792],
       [-1.32503909,  1.28815191],
       [ 1.80576425,  0.99594517],
       [-1.97679324,  0.27128576],
       [-1.74128134,  2.56158409],
       [ 1.58181139,  0.76777454],
       [-0.21715638,  2.41168968],
       [ 0.24130261,  0.68114926],
       [ 2.93403797, -0.70272327],
       [ 0.54918186, -0.9246533 ],
       [-0.94476044,  0.79698854],
       [-0.20668873, -0.29360169],
       [ 2.07157736, -1.68353633],
       [-1.32415501,  2.02986065],
       [ 1.11469681,  0.88390513],
       [-0.17081595,  2.95303183],
       [ 1.81382203,  0.58587021],
       [ 0.42420337,  0.87468868],
       [-1.38599572,  1.79387082],
       [ 0.56997773,  0.84988166],
       [ 0.03764063,  0.70148957],
       [ 0.85092623,  0.22867238],
       [-0.82355759,  1.86393625]])
Y1 = np.array([0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 
              1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1])

plt.scatter(X1[:, 0], X1[:, 1], marker='o', c=Y1, s=20, edgecolor='k')  # plot the given points

In [None]:
from sklearn.linear_model import Perceptron

ppr = Perceptron()
ppr.fit(X1, Y1)  # this is where the learning happens

**Task:** Complete the code below to find the weights and bias that were learned by the perceptron

In [None]:
bias = # ...
weights = # ...
bias, weights

We want to add the decision boundary learned by the perceptron to the plot of data points. Therefore we need to find the slope and intercept for the classes separation line.

**Task:** Complete code below to find the intercept and slope

*Note:* You need to find the line that passes through two points:
* point_1 = ( 0 , - bias / weight_2 ) 
* point_2 = ( - bias / weight_1 , 0 ) 

In [None]:
intercept = # ...
slope = # ...
intercept, slope

Below cell plots the data points with the decision boundary.

In [None]:
x_min, x_max = X1[:, 0].min() - 1, X1[:, 0].max() + 1  # find the x-boundaries for the plot
y_min, y_max = X1[:, 1].min() - 1, X1[:, 1].max() + 1  # find the y-boundaries for the plot

a = np.linspace(x_min, x_max,100)  # returns 100 evenly spaced samples, calculated over the interval [x_min, x_max]
axes = plt.gca()
axes.set_ylim([y_min,y_max])  # set the y-boundaries
axes.set_xlim([x_min,x_max])  # set the x-boundaries

plt.plot(a, slope * a + intercept, 'b')  # plot the decision boundary
plt.scatter(X1[:, 0], X1[:, 1], marker='o', c=Y1, s=20, edgecolor='k')

### Perceptron Training algorithm

Now we're going to implement the Perceptron Training algorithm from scratch.

We create data for classification using the make_blobs function from sklearn. As the data is created randomly, the classes will not always be linearly separable.

For a start, execute below cell until you get two nicely separable blobs.

In [None]:
from sklearn import datasets

X, Y = datasets.make_blobs(n_samples=100, centers=2, n_features=2, center_box=(0, 10))
plt.plot(X[:, 0][Y == 0], X[:, 1][Y == 0], 'g^')  # plot the given points of first class
plt.plot(X[:, 0][Y == 1], X[:, 1][Y == 1], 'bs')  # plot the given points of second class
plt.show()

**Task:** Complete the code below to implement Perceptron Training algorithm

In [None]:
def perceptron(features, labels, learning_rate=1, max_num_iter=50):  
    
    # set weights and bias to zero
    weights = np.zeros(shape = features.shape[1])
    bias = 0
    
    num_iter = 0
    
    while True:
        
        misclassified = 0  # set number of misclassified elements in this iteration to zero
        
        for i in range(len(features)):
            x = features[i]  # get the features of the current point
            
            actual = labels[i]  # get the label of the current point
            
            predicted = # ...
            
            error = actual - predicted
            
            if predicted != actual:
                
                weights += # ...
                bias += # ...
                misclassified += # ...
            
        if max_num_iter <=20 :
            print(f" - NumIteration : {num_iter+1}. Missclassified : {misclassified}" )
        
        num_iter += 1
                  
        if misclassified == 0 or num_iter >= max_num_iter:
            break
            
    return weights, bias

In [None]:
weights, bias = perceptron(X,Y)
weights, bias

**Task:** Use the formula for intercept and slope from the example above to plot the decision boundary.

In [None]:
intercept = # ...
slope = # ...
intercept, slope

In [None]:
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1  # find the x-boundaries for the plot
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1  # find the y-boundaries for the plot

a = np.linspace(x_min, x_max,100)  # returns 100 evenly spaced samples, calculated over the interval [x_min, x_max]
axes = plt.gca()
axes.set_ylim([y_min,y_max])  # set the y-boundaries
axes.set_xlim([x_min,x_max])  # set the x-boundaries

plt.plot(a, slope * a + intercept, 'b')  # plot the decision boundary
plt.plot(X[:, 0][Y == 0], X[:, 1][Y == 0], 'g^')
plt.plot(X[:, 0][Y == 1], X[:, 1][Y == 1], 'bs')
plt.show()

**Task:** Re-run the blob creation cell again until you get two blobs with very high overlap. Running the perceptron once more and looking at the decision boundary, it will probably look somewhat (or ever completely) off. Tweak your perceptron function to achieve a better decision boundary.