#__Applying Decision Tree__

Let's look at how to construct a decision tree model.

## Step 1: Import Required Libraries and Read the Dataset

- Import NumPy, pandas, Seaborn, and matplotlib.pyplot libraries
- Configure matplotlib settings
- Read the dataset and display the first few rows


In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
df = pd.read_csv('balance-scale.data',sep=',')
df.head()

Unnamed: 0,Class Name,Left weight,Left distance,Right weight,Right distance
0,B,1,1,1,1
1,R,1,1,1,2
2,R,1,1,1,3
3,R,1,1,1,4
4,R,1,1,1,5


__Observation__
- In the above output, the first few rows of the dataset can be seen.

## Step 2: Analyze the Dataset

- Display information about the dataset


In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 625 entries, 0 to 624
Data columns (total 5 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   Class Name      625 non-null    object
 1   Left weight     625 non-null    int64 
 2   Left distance   625 non-null    int64 
 3   Right weight    625 non-null    int64 
 4   Right distance  625 non-null    int64 
dtypes: int64(4), object(1)
memory usage: 24.5+ KB


__Observations__
- Class Name is the target variable that we are going to predict.
- You can also see that these are non null values.

## Step 3: Split the Dataset

- Split the dataset into training and testing sets


In [None]:
from sklearn.model_selection import train_test_split
X = df.drop('Class Name',axis=1)
y = df[['Class Name']]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3,random_state=42)

## Step 4: Train the Decision Tree Classifier

- Import DecisionTreeClassifier from sklearn.tree
- Train the model using the training dataset


In [None]:
from sklearn.tree import DecisionTreeClassifier
clf_model = DecisionTreeClassifier(criterion="gini", random_state=42,max_depth=3, min_samples_leaf=5)   
clf_model.fit(X_train,y_train)

## Step 5: Make Predictions

- Predict the outcomes using the testing dataset


In [None]:
y_predict = clf_model.predict(X_test)

## Step 6: Evaluate the Model

- Import accuracy_score, classification_report, and confusion_matrix from sklearn.metrics
- Calculate the accuracy score


In [16]:
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Assuming y_test contains the true labels and y_predict contains the predicted labels
accuracy = accuracy_score(y_test, y_predict)
print("Accuracy:", accuracy)

cm = confusion_matrix(y_test, y_predict)
print("Confusion Matrix:")
print(cm)

report = classification_report(y_test, y_predict, zero_division=1)
print("Classification Report:")
print(report)

Accuracy: 0.7021276595744681
Confusion Matrix:
[[ 0  8 10]
 [ 0 62 18]
 [ 0 20 70]]
Classification Report:
              precision    recall  f1-score   support

           B       1.00      0.00      0.00        18
           L       0.69      0.78      0.73        80
           R       0.71      0.78      0.74        90

    accuracy                           0.70       188
   macro avg       0.80      0.52      0.49       188
weighted avg       0.73      0.70      0.67       188



__Observation__
- In the above output, you can see the confusion matrix and the values for accuracy, precision, recall, f1-score, and support.

## Step 7: Display the Decision Tree

- Create a list of target and feature names
- Import export_text from sklearn.tree
- Display the decision tree as text


In [None]:
target = list(df['Class Name'].unique())
feature_names = list(X.columns)

__Observation__
- Now, we have created a target.

Let's plot the decision tree using an export_tree.

In [None]:
from sklearn.tree import export_text
r = export_text(clf_model, feature_names=feature_names)
print(r)

|--- Left weight <= 2.50
|   |--- Right distance <= 1.50
|   |   |--- Left distance <= 2.50
|   |   |   |--- class: R
|   |   |--- Left distance >  2.50
|   |   |   |--- class: L
|   |--- Right distance >  1.50
|   |   |--- Right weight <= 2.50
|   |   |   |--- class: R
|   |   |--- Right weight >  2.50
|   |   |   |--- class: R
|--- Left weight >  2.50
|   |--- Left distance <= 2.50
|   |   |--- Right weight <= 2.50
|   |   |   |--- class: L
|   |   |--- Right weight >  2.50
|   |   |   |--- class: R
|   |--- Left distance >  2.50
|   |   |--- Right distance <= 3.50
|   |   |   |--- class: L
|   |   |--- Right distance >  3.50
|   |   |   |--- class: L



__Observations__
- You can see how the decision tree has spread.
- For example, in the above case, the left weight is distributed between two points: less than 2.5 and greater than 2.5.
- The right distance is divided into two points: less than 1.5 and greater than 5.
- Finally, we have a class that is making predictions, as this is how it has been predicted from top to bottom.