# Decision Tree Algorithm 👨🏻‍💻
---


## SKlearn implementation
---

### `Imports`

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

### `Importing Dataset`

Next, we import the dataset from the CSV file to the Pandas dataframes.

In [None]:
col = [ 'Class Name','Left weight','Left distance','Right weight','Right distance']
df = pd.read_csv('./data/balance-scale.data',names=col,sep=',')
df.head()

### `Information About Dataset`

We can get the overall information of our data set by using the df.info function. From the output, we can see that it has 625 records with 5 fields.

In [None]:
df.info()

### `Exploratory Data Analysis (EDA)`




Let us do a bit of exploratory data analysis to understand our dataset better. We have plotted the classes by using countplot function. We can see in the figure given below that most of the classes names fall under the labels R and L which means Right and Left respectively. Very few data fall under B, which stands for balanced.


In [None]:
sns.countplot(df['Class Name'])

In [None]:
sns.countplot(df['Left weight'],hue=df['Class Name'])

In [None]:
sns.countplot(df['Right weight'],hue=df['Class Name'])

### `Splitting the Dataset in Train-Test`




Before feeding the data into the model we first split it into train and test data using the train_test_split function.


In [None]:
from sklearn.model_selection import train_test_split
X = df.drop('Class Name',axis=1)
y = df[['Class Name']]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3,random_state=42)

### `Training the Decision Tree Classifier`

We have used the Gini index as our attribute selection method for the training of decision tree classifier with Sklearn function DecisionTreeClassifier().


We have created the decision tree classifier by passing other parameters such as random state, max_depth, and min_sample_leaf to DecisionTreeClassifier().


Finally, we do the training process by using the model.fit() method.


In [None]:
from sklearn.tree import DecisionTreeClassifier
clf_model = DecisionTreeClassifier(criterion="gini", random_state=42,max_depth=3, min_samples_leaf=5)   
clf_model.fit(X_train,y_train)

### `Test Accuracy`



We will now test accuracy by using the classifier on test data. For this we first use the model.predict function and pass X_test as attributes.


In [None]:
y_predict = clf_model.predict(X_test)



Next, we use accuracy_score function of Sklearn to calculate the accuracty. We can see that we are getting a pretty good accuracy of 78.6% on our test data.


In [None]:
from sklearn.metrics import accuracy_score,classification_report,confusion_matrix
accuracy_score(y_test,y_predict)

### `Plotting Decision Tree`



We can plot our decision tree with the help of the Graphviz library and passing after a bunch of parameters such as classifier model, target values, and the features name of our data.


In [None]:
target = list(df['Class Name'].unique())
feature_names = list(X.columns)

In [None]:
from sklearn import tree
import graphviz
dot_data = tree.export_graphviz(clf_model,
                                out_file=None, 
                      feature_names=feature_names,  
                      class_names=target,  
                      filled=True, rounded=True,  
                      special_characters=True)  
graph = graphviz.Source(dot_data)  

graph