#Vehicle silhouettes

##Objective
To classify a given silhouette as one of four types of vehicle, 	using a set of features extracted from the silhouette. The 	vehicle may be viewed from one of many different angles.   

##Description

###The features were extracted from the silhouettes by the HIPS
(Hierarchical Image Processing System) extension BINATTS, which extracts a combination of scale independent features utilising	both classical moments based measures such as scaled variance,	skewness and kurtosis about the major/minor axes and heuristic	measures such as hollows, circularity, rectangularity and	compactness. Four "Corgie" model vehicles were used for the experiment: a double decker bus, Cheverolet van, Saab 9000 and an Opel Manta 400.	This particular combination of vehicles was chosen with the expectation that the bus, van and either one of the cars would be readily distinguishable, but it would be more difficult to distinguish between the cars.
	
##Source: https://www.kaggle.com/rajansharma780/vehicle

## ATTRIBUTES
1.	compactness	float	average perimeter**2/area
2.	circularity	float	average radius**2/area
3.	distance_circularity	float	area/(av.distance from border)**2
4.	radius_ratio	float	(max.rad-min.rad)/av.radius
5.	pr_axis_aspect_ratio	float	(minor axis)/(major axis)
6.	max_length_aspect_ratio	float	(length perp. max length)/(max length)
7.	scatter_ratio	float	(inertia about minor axis)/(inertia about major axis)
8.	elongatedness	float	area/(shrink width)**2
9.	pr_axis_rectangularity	float	area/(pr.axis length*pr.axis width)
10.	max_length_rectangularity	float	area/(max.length*length perp. to this)
11.	scaled_variance_major_axis	float	(2nd order moment about minor axis)/area
12.	scaled_variance_minor_axis	float	(2nd order moment about major axis)/area
13.	scaled_radius_gyration	float	(mavar+mivar)/area
14.	skewness_major_axis	float	(3rd order moment about major axis)/sigma_min**3
15.	skewness_minor_axis	float	(3rd order moment about minor axis)/sigma_maj**3
16.	kurtosis_minor_axis	float	(4th order moment about major axis)/sigma_min**4
17.	kurtosis_major_axis	float	(4th order moment about minor axis)/sigma_maj**4
18.	hollows_ratio	float	(area of hollows)/(area of bounding polygon)

##Target variable
19.	vehicle_class	string	Predictor Class. Values: Opel, Saab, Bus, Van	

#Tasks:
1.	Obtain the multi-class dataset from the given link
2.	Load the dataset
3.	Apply pre-processing techniques: Encoding, Scaling
4.	Divide the dataset into training (70%) and testing (30%)
5.	Build your own random forest model from scratch (using invidual decision tree model from sklearn)
6.	Train the random forest model
7.	Test the random forest model
8.	Train and test the random forest model using sklearn.
9.	Compare the performance of both the models

##Useful links:
https://machinelearningmastery.com/implement-random-forest-scratch-python/

https://towardsdatascience.com/random-forests-and-decision-trees-from-scratch-in-python-3e4fa5ae4249

https://www.analyticsvidhya.com/blog/2018/12/building-a-random-forest-from-scratch-understanding-real-world-data-products-ml-for-programmers-part-3/

# Part 1: Random Forest from scratch

Random forests are an ensemble learning method for classification and regression that operate by constructing multiple decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees.

In [1]:
# Load the libraries
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestClassifier,VotingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler,LabelEncoder
from sklearn.metrics import accuracy_score,precision_score,recall_score,f1_score
from sklearn import tree
import random

In [2]:
# Load the dataset 
data=pd.read_csv("vehicle.csv")
data

Unnamed: 0,compactness,circularity,distance_circularity,radius_ratio,pr.axis_aspect_ratio,max.length_aspect_ratio,scatter_ratio,elongatedness,pr.axis_rectangularity,max.length_rectangularity,scaled_variance,scaled_variance.1,scaled_radius_of_gyration,scaled_radius_of_gyration.1,skewness_about,skewness_about.1,skewness_about.2,hollows_ratio,class
0,95,48.0,83.0,178.0,72.0,10,162.0,42.0,20.0,159,176.0,379.0,184.0,70.0,6.0,16.0,187.0,197,van
1,91,41.0,84.0,141.0,57.0,9,149.0,45.0,19.0,143,170.0,330.0,158.0,72.0,9.0,14.0,189.0,199,van
2,104,50.0,106.0,209.0,66.0,10,207.0,32.0,23.0,158,223.0,635.0,220.0,73.0,14.0,9.0,188.0,196,car
3,93,41.0,82.0,159.0,63.0,9,144.0,46.0,19.0,143,160.0,309.0,127.0,63.0,6.0,10.0,199.0,207,van
4,85,44.0,70.0,205.0,103.0,52,149.0,45.0,19.0,144,241.0,325.0,188.0,127.0,9.0,11.0,180.0,183,bus
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
841,93,39.0,87.0,183.0,64.0,8,169.0,40.0,20.0,134,200.0,422.0,149.0,72.0,7.0,25.0,188.0,195,car
842,89,46.0,84.0,163.0,66.0,11,159.0,43.0,20.0,159,173.0,368.0,176.0,72.0,1.0,20.0,186.0,197,van
843,106,54.0,101.0,222.0,67.0,12,222.0,30.0,25.0,173,228.0,721.0,200.0,70.0,3.0,4.0,187.0,201,car
844,86,36.0,78.0,146.0,58.0,7,135.0,50.0,18.0,124,155.0,270.0,148.0,66.0,0.0,25.0,190.0,195,car


In [3]:
# Preprocessing
# Encoding categorical variables (if any)
# Feature Scaling
# Filling missing values (if any)
le=LabelEncoder()
temp=le.fit_transform(data["class"])
minmax=MinMaxScaler()
data=pd.DataFrame(minmax.fit_transform(data.iloc[:,:-1]),columns=data.iloc[:,:-1].columns)
data["class"]=temp
data=data.fillna(0)

In [4]:
data

Unnamed: 0,compactness,circularity,distance_circularity,radius_ratio,pr.axis_aspect_ratio,max.length_aspect_ratio,scatter_ratio,elongatedness,pr.axis_rectangularity,max.length_rectangularity,scaled_variance,scaled_variance.1,scaled_radius_of_gyration,scaled_radius_of_gyration.1,skewness_about,skewness_about.1,skewness_about.2,hollows_ratio,class
0,0.478261,0.576923,0.597222,0.323144,0.274725,0.150943,0.326797,0.457143,0.250000,0.585714,0.242105,0.233813,0.471698,0.144737,0.272727,0.390244,0.366667,0.533333,2
1,0.391304,0.307692,0.611111,0.161572,0.109890,0.132075,0.241830,0.542857,0.166667,0.357143,0.210526,0.175060,0.308176,0.171053,0.409091,0.341463,0.433333,0.600000,2
2,0.673913,0.653846,0.916667,0.458515,0.208791,0.150943,0.620915,0.171429,0.500000,0.571429,0.489474,0.540767,0.698113,0.184211,0.636364,0.219512,0.400000,0.500000,1
3,0.434783,0.307692,0.583333,0.240175,0.175824,0.132075,0.209150,0.571429,0.166667,0.357143,0.157895,0.149880,0.113208,0.052632,0.272727,0.243902,0.766667,0.866667,2
4,0.260870,0.423077,0.416667,0.441048,0.615385,0.943396,0.241830,0.542857,0.166667,0.371429,0.584211,0.169065,0.496855,0.894737,0.409091,0.268293,0.133333,0.066667,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
841,0.434783,0.230769,0.652778,0.344978,0.186813,0.113208,0.372549,0.400000,0.250000,0.228571,0.368421,0.285372,0.251572,0.171053,0.318182,0.609756,0.400000,0.466667,1
842,0.347826,0.500000,0.611111,0.257642,0.208791,0.169811,0.307190,0.485714,0.250000,0.585714,0.226316,0.220624,0.421384,0.171053,0.045455,0.487805,0.333333,0.533333,2
843,0.717391,0.807692,0.847222,0.515284,0.219780,0.188679,0.718954,0.114286,0.666667,0.785714,0.515789,0.643885,0.572327,0.144737,0.136364,0.097561,0.366667,0.666667,1
844,0.282609,0.115385,0.527778,0.183406,0.120879,0.094340,0.150327,0.685714,0.083333,0.085714,0.131579,0.103118,0.245283,0.092105,0.000000,0.609756,0.466667,0.466667,1


In [5]:
x=data.iloc[:,:-1]
y=data.iloc[:,-1:]

In [6]:
# Divide the dataset to training and testing set
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.3)

In [7]:
# Randomly choose the features from training set and build decision tree
# Randomness in the features will help us to achieve different DTrees every time
# You can keep minimum number of random features every time so that trees will have sufficient features
# Note: You can use builtin function for DT training using Sklearn
def choose(x_train,x_test):
    minrand=4
    rand=random.randint(minrand,len(x_train.columns)-1)
    if(rand>minrand):
        minrand=rand
    feat=[0,1,2,3]
    temp_train=pd.DataFrame(x_train.iloc[:,:4],columns=x_train.iloc[:,:4].columns)
    temp_test=pd.DataFrame(x_test.iloc[:,:4],columns=x_test.iloc[:,:4].columns)
    columns=x.columns
    while(len(feat)!=minrand):
        ind=random.randint(0,len(x_train.columns)-1)
        if(ind not in feat):
            temp_train[columns[ind]]=x_train.iloc[:,ind]
            temp_test[columns[ind]]=x_test.iloc[:,ind]
            feat.append(ind)
    return temp_train,temp_test

In [8]:
# Train N number of decision trees using random feature selection strategy
# Number of trees N can be user input
n=int(input())
trees=[]
random_x_train=[None for i in range(n)]
random_x_test=[None for i in range(n)]
for i in range(n):
    random_x_train[i],random_x_test[i]=choose(x_train,x_test)
    dt=DecisionTreeClassifier()
    dt.fit(random_x_train[i],y_train)
    trees.append(dt)

In [9]:
def mode(p):
    fp=[]
    for i in range(len(p[0])):
        nz,no,nt=0,0,0
        for j in range(len(p)):
            if(p[j][i]==0):
                nz+=1
            elif(p[j][i]==1):
                no+=1
            else:
                nt+=1
        m=max(nz,no,nt)
        if(m==nz):
            fp.append(0)
        elif(m==no):
            fp.append(1)
        else:
            fp.append(2)
    return fp

In [10]:
def average(p):
    fp=[]
    for i in range(len(p[0])):
        avg=0
        for j in range(len(p)):
            avg+=p[j][i]
        fp.append(avg/10)
    return fp

In [11]:
# Apply different voting mechanisms such as 
# max voting/average voting/weighted average voting (using accuracy as weightage)
# Perform the ensembling for the training set.
print("Max Voting")
pred=[]
for j in range(n):
    pred.append(trees[j].predict(random_x_train[j]))
fp=mode(pred)
print("Final Prediction",fp)
print("Average Voting")
pred=[]
for j in range(n):
    pred.append(trees[j].predict(random_x_train[j]))
fp=average(pred)
print("Final Prediction",fp)
print("Weighted Average Voting")
pred=[]
acc=[]
for j in range(n):
    pred.append(trees[j].predict(random_x_train[j]))
    acc.append(accuracy_score(y_train,pred[j]))
acc=np.array(acc)
for i in range(len(pred)):
    pred[i]=(pred[i]*acc[i])
fp=average(pred)
print("Final Prediction",fp)

Max Voting
Final Prediction [1, 2, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 2, 1, 2, 1, 0, 1, 1, 2, 1, 2, 2, 1, 0, 0, 2, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 2, 2, 2, 1, 1, 2, 2, 1, 1, 1, 1, 1, 1, 2, 1, 0, 2, 1, 0, 0, 2, 1, 0, 2, 0, 1, 1, 1, 2, 0, 1, 1, 2, 2, 1, 0, 0, 0, 2, 2, 0, 1, 2, 0, 1, 1, 2, 1, 1, 0, 2, 1, 1, 1, 1, 1, 1, 0, 1, 1, 2, 1, 2, 2, 0, 1, 1, 1, 2, 1, 0, 2, 2, 2, 0, 1, 0, 1, 2, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 2, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 1, 0, 1, 1, 2, 2, 2, 2, 2, 0, 0, 0, 2, 1, 2, 1, 1, 1, 2, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 0, 2, 0, 1, 0, 0, 1, 1, 2, 0, 0, 0, 1, 0, 0, 2, 2, 1, 1, 2, 0, 0, 1, 1, 2, 2, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 2, 2, 0, 1, 2, 1, 0, 0, 1, 2, 2, 0, 2, 2, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 2, 2, 1, 1, 0, 1, 1, 0, 0, 2, 1, 0, 0, 1, 1, 0, 2, 1, 1, 2, 0, 0, 0, 2, 2, 0, 1, 0, 0, 2, 1, 2, 1, 1, 1, 0, 1, 1, 1, 0, 1, 2, 1, 1, 1, 1, 2,

In [12]:
# Apply invidual trees trained on the testingset
# Note: You should've saved the feature sets used for training invidual trees,
# so that same features can be chosen in testing set

# Get predictions on testing set
pred=[]
for j in range(n):
    print("Tree",(j+1))
    pred.append(trees[j].predict(random_x_test[j]))
    print(pred[j])

Tree 1
[1 0 1 2 1 1 1 0 1 0 1 0 2 0 0 1 2 2 1 1 2 1 0 1 2 1 2 0 2 0 0 2 1 2 1 1 0
 1 1 0 0 0 1 1 1 2 2 0 0 0 1 0 0 1 0 0 0 1 0 2 1 0 1 1 0 2 1 1 1 1 2 0 2 1
 1 1 2 1 1 2 1 1 2 1 1 1 0 2 0 2 1 1 1 1 2 0 2 0 0 2 1 1 0 1 0 2 1 1 1 1 1
 1 0 1 2 2 2 1 2 0 1 0 1 2 2 1 2 1 2 0 0 1 2 1 1 1 0 1 1 1 1 1 0 1 1 0 1 1
 1 0 1 1 2 1 2 1 1 2 0 0 1 1 1 0 2 2 1 1 2 1 2 1 1 1 1 2 1 1 0 2 1 0 1 1 0
 1 0 0 1 1 0 2 1 2 2 1 1 2 1 2 0 1 0 1 1 1 2 0 2 1 2 1 2 1 2 0 2 0 1 0 1 2
 1 2 0 1 2 2 0 1 0 2 1 2 1 2 2 1 1 0 1 0 0 0 1 1 1 1 1 0 1 2 0 1]
Tree 2
[0 0 1 2 1 2 0 0 1 0 1 1 0 1 0 1 2 0 1 1 2 1 0 1 2 1 2 0 2 0 1 2 1 2 1 1 1
 1 1 0 0 0 1 1 1 2 2 0 0 0 1 0 1 1 0 0 1 2 1 2 1 0 1 1 0 1 2 1 1 0 2 1 2 1
 1 1 2 1 1 2 1 1 2 1 1 1 0 0 0 1 1 1 1 1 2 1 2 1 0 2 1 1 0 1 0 2 1 1 0 1 2
 1 0 1 2 2 2 2 2 1 1 1 1 2 2 1 2 1 2 0 0 1 1 1 0 1 0 1 1 1 1 1 0 1 1 0 1 2
 1 0 1 2 2 1 2 1 1 0 0 0 1 1 1 1 2 1 1 1 2 1 1 1 1 1 1 2 1 1 0 0 1 0 1 1 1
 1 0 0 1 1 2 2 1 2 2 1 1 2 1 2 0 1 0 1 1 1 2 0 2 1 2 1 2 1 2 0 1 0 0 0 1 2
 0 2 0 1 2 2 0 1 0 2

In [13]:
# Evaluate the results using accuracy, precision, recall and f-measure
for j in range(n):
    print("Tree",(j+1))
    print("Accuracy",accuracy_score(y_test,pred[j]))
    print("Precision",precision_score(y_test,pred[j],labels=[0,1,2],average="macro"))
    print("Recall",recall_score(y_test,pred[j],labels=[0,1,2],average="macro"))
    print("F1 score",f1_score(y_test,pred[j],labels=[0,1,2],average="macro"))

Tree 1
Accuracy 0.905511811023622
Precision 0.8956940836940838
Recall 0.9099961919525058
F1 score 0.902301747311828
Tree 2
Accuracy 0.8385826771653543
Precision 0.836283741368487
Recall 0.8268293765970588
F1 score 0.8313427010148322
Tree 3
Accuracy 0.8188976377952756
Precision 0.8027974879699862
Recall 0.8195940863579053
F1 score 0.8101241467222723
Tree 4
Accuracy 0.8464566929133859
Precision 0.8412238612439332
Recall 0.834397702791985
F1 score 0.8376821598404663
Tree 5
Accuracy 0.8937007874015748
Precision 0.8803037135618884
Recall 0.9083787809672978
F1 score 0.8920258004026284
Tree 6
Accuracy 0.8661417322834646
Precision 0.8691399945502946
Recall 0.8587120187630438
F1 score 0.8635352733939824
Tree 7
Accuracy 0.8740157480314961
Precision 0.8642449874030804
Recall 0.855129359238821
F1 score 0.8594285810173448
Tree 8
Accuracy 0.9094488188976378
Precision 0.9038240555751426
Recall 0.9128932683677741
F1 score 0.9075393421442791
Tree 9
Accuracy 0.8346456692913385
Precision 0.83241560261662

In [14]:
# Compare different voting mechanisms and their accuracies
print("Max Voting")
pred=[]
for j in range(n):
    pred.append(trees[j].predict(random_x_test[j]))
fp=mode(pred)
print("Final Prediction",fp)
print("Accuracy",accuracy_score(y_test,fp))
print("Average Voting")
pred=[]
for j in range(n):
    pred.append(trees[j].predict(random_x_test[j]))
fp=average(pred)
print("Final Prediction",fp)
for i in range(len(fp)):
    if(fp[i]<=0.5):
        fp[i]=0
    elif(fp[i]>=1.5):
        fp[i]=2
    else:
        fp[i]=1
print("Accuracy",accuracy_score(y_test,fp))
print("Weighted Average Voting")
pred=[]
acc=[]
for j in range(n):
    pred.append(trees[j].predict(random_x_test[j]))
    acc.append(accuracy_score(y_test,pred[j]))
acc=np.array(acc)
for i in range(len(pred)):
    pred[i]=(pred[i]*acc[i])
fp=average(pred)
print("Final Prediction",fp)
for i in range(len(fp)):
    if(fp[i]<=0.5):
        fp[i]=0
    elif(fp[i]>=1.5):
        fp[i]=2
    else:
        fp[i]=1
print("Accuracy",accuracy_score(y_test,fp))

Max Voting
Final Prediction [2, 0, 1, 2, 1, 2, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 2, 0, 1, 1, 2, 1, 0, 1, 2, 1, 2, 0, 2, 0, 0, 2, 1, 2, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 2, 2, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 2, 0, 2, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 2, 0, 2, 1, 1, 1, 2, 1, 1, 2, 1, 1, 2, 1, 1, 1, 0, 2, 0, 2, 1, 1, 1, 1, 2, 0, 2, 1, 0, 2, 1, 1, 0, 1, 0, 2, 1, 1, 1, 1, 1, 1, 0, 1, 2, 2, 2, 2, 2, 0, 1, 0, 1, 2, 2, 1, 2, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 2, 2, 1, 2, 1, 1, 2, 0, 0, 1, 1, 1, 0, 2, 2, 1, 1, 2, 1, 2, 1, 1, 1, 1, 2, 1, 1, 0, 2, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 2, 1, 2, 2, 1, 1, 2, 1, 1, 0, 1, 0, 1, 1, 1, 2, 0, 2, 1, 2, 1, 2, 1, 2, 0, 1, 0, 0, 0, 1, 2, 0, 2, 0, 1, 2, 2, 0, 1, 0, 2, 1, 2, 1, 2, 2, 1, 1, 1, 1, 0, 0, 2, 1, 1, 1, 1, 1, 0, 1, 2, 0, 1]
Accuracy 0.9251968503937008
Average Voting
Final Prediction [1.5, 0.0, 1.0, 1.9, 1.0, 1.7, 1.1, 0.1, 1.1, 0.2, 1.0, 0.2, 1.0, 0.7, 0.0, 1.0, 2.0, 0.4, 1.0, 1.0, 1.8, 1.0, 0.0, 1.0, 1.8, 1.0, 2.0, 0.0, 2.0, 0.0

In [None]:
# Compare the Random forest models with different number of trees N

In [15]:
# Compare different values for minimum number of features needed for individual trees
for i in range(n):
    print("Tree",(i+1))
    print(random_x_train[i].columns)
    print(len(random_x_train[i].columns))

Tree 1
Index([&#39;compactness&#39;, &#39;circularity&#39;, &#39;distance_circularity&#39;, &#39;radius_ratio&#39;,
       &#39;skewness_about&#39;, &#39;scaled_radius_of_gyration.1&#39;, &#39;skewness_about.2&#39;,
       &#39;scaled_variance.1&#39;, &#39;scaled_radius_of_gyration&#39;,
       &#39;pr.axis_aspect_ratio&#39;, &#39;scaled_variance&#39;, &#39;max.length_aspect_ratio&#39;,
       &#39;hollows_ratio&#39;, &#39;skewness_about.1&#39;, &#39;scatter_ratio&#39;,
       &#39;max.length_rectangularity&#39;],
      dtype=&#39;object&#39;)
16
Tree 2
Index([&#39;compactness&#39;, &#39;circularity&#39;, &#39;distance_circularity&#39;, &#39;radius_ratio&#39;,
       &#39;scatter_ratio&#39;, &#39;scaled_variance&#39;],
      dtype=&#39;object&#39;)
6
Tree 3
Index([&#39;compactness&#39;, &#39;circularity&#39;, &#39;distance_circularity&#39;, &#39;radius_ratio&#39;,
       &#39;hollows_ratio&#39;, &#39;skewness_about.1&#39;, &#39;scaled_variance&#39;,
       &#39;pr.axis_aspect_ratio&#39

## Part 2: Random Forest using Sklearn

In [16]:
# Use the preprocessed dataset here
x=data.iloc[:,:-1]
y=data.iloc[:,-1:]
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.3)

In [17]:
# Train the Random Forest Model using builtin Sklearn Dataset
rf=RandomForestClassifier()
rf.fit(x_train,y_train)

RandomForestClassifier()

In [18]:
# Test the model with testing set and print the accuracy, precision, recall and f-measure
pred=rf.predict(x_test)
print("Accuracy",accuracy_score(y_test,pred))
print("Precision",precision_score(y_test,pred,labels=[0,1,2],average="macro"))
print("Recall",recall_score(y_test,pred,labels=[0,1,2],average="macro"))
print("F1 score",f1_score(y_test,pred,labels=[0,1,2],average="macro"))

Accuracy 0.9803149606299213
Precision 0.9759305210918114
Recall 0.9814835196191128
F1 score 0.9784701701259646


In [19]:
# Play with parameters such as
# number of decision trees
# Criterion for splitting
# Max depth
# Minimum samples per split and leaf
rf=RandomForestClassifier(n_estimators=10)
rf.fit(x_train,y_train)
pred=rf.predict(x_test)
print("Accuracy",accuracy_score(y_test,pred))
print("Precision",precision_score(y_test,pred,labels=[0,1,2],average="macro"))
print("Recall",recall_score(y_test,pred,labels=[0,1,2],average="macro"))
print("F1 score",f1_score(y_test,pred,labels=[0,1,2],average="macro"))

Accuracy 0.952755905511811
Precision 0.9471653627692637
Recall 0.9551507814219679
F1 score 0.9507602118794883


In [20]:
rf=RandomForestClassifier(criterion="entropy")
rf.fit(x_train,y_train)
pred=rf.predict(x_test)
print("Accuracy",accuracy_score(y_test,pred))
print("Precision",precision_score(y_test,pred,labels=[0,1,2],average="macro"))
print("Recall",recall_score(y_test,pred,labels=[0,1,2],average="macro"))
print("F1 score",f1_score(y_test,pred,labels=[0,1,2],average="macro"))

Accuracy 0.968503937007874
Precision 0.9678437898023748
Recall 0.965251791522978
F1 score 0.9664366781405197


In [24]:
rf=RandomForestClassifier(max_depth=3)
rf.fit(x_train,y_train)
pred=rf.predict(x_test)
print("Accuracy",accuracy_score(y_test,pred))
print("Precision",precision_score(y_test,pred,labels=[0,1,2],average="macro"))
print("Recall",recall_score(y_test,pred,labels=[0,1,2],average="macro"))
print("F1 score",f1_score(y_test,pred,labels=[0,1,2],average="macro"))

Accuracy 0.8622047244094488
Precision 0.850806692003875
Recall 0.8777097039808904
F1 score 0.8610907610907611


In [27]:
rf=RandomForestClassifier(min_samples_leaf=3)
rf.fit(x_train,y_train)
pred=rf.predict(x_test)
print("Accuracy",accuracy_score(y_test,pred))
print("Precision",precision_score(y_test,pred,labels=[0,1,2],average="macro"))
print("Recall",recall_score(y_test,pred,labels=[0,1,2],average="macro"))
print("F1 score",f1_score(y_test,pred,labels=[0,1,2],average="macro"))

Accuracy 0.9724409448818898
Precision 0.9708610358696544
Recall 0.9736672618028551
F1 score 0.9719775613701999


In [30]:
rf=RandomForestClassifier(min_samples_split=4)
rf.fit(x_train,y_train)
pred=rf.predict(x_test)
print("Accuracy",accuracy_score(y_test,pred))
print("Precision",precision_score(y_test,pred,labels=[0,1,2],average="macro"))
print("Recall",recall_score(y_test,pred,labels=[0,1,2],average="macro"))
print("F1 score",f1_score(y_test,pred,labels=[0,1,2],average="macro"))

Accuracy 0.9763779527559056
Precision 0.9733664185277089
Recall 0.9789582670938604
F1 score 0.9759256408638781
