This project implements a Decision Tree Classifier using Python. The model is designed to classify data points based on given features, making it one of the most interpretable and powerful supervised learning algorithms for both classification and regression tasks.
The dataset used in this project is processed within the notebook DECISION_TREE.ipynb
. It may come from a public dataset (e.g., Iris, Titanic, or a custom CSV file).
- Data Loading β Import the dataset into a pandas DataFrame.
- Data Preprocessing β Handle missing values, encode categorical features, and normalize data if required.
- Data Splitting β Divide the dataset into training and testing subsets (commonly 70-30 or 80-20 split).
A Decision Tree splits the dataset into branches based on feature conditions to predict the target variable.
Key Concepts:
- Entropy / Gini Index β Used to measure the purity of nodes.
- Information Gain β Determines which feature provides the most significant split.
- Pruning β Prevents overfitting by limiting the depth or complexity of the tree.
- Import libraries (
pandas
,numpy
,sklearn
). - Initialize and train a
DecisionTreeClassifier
fromsklearn.tree
. - Evaluate model accuracy on test data.
- Visualize the trained tree using
graphviz
orplot_tree()
fromsklearn
.
- Accuracy Score
- Confusion Matrix
- Precision, Recall, and F1-Score
- Cross-Validation (optional)
Example:
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
y_pred = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))
Decision Tree visualization helps understand how the model splits data:
from sklearn import tree
import matplotlib.pyplot as plt
plt.figure(figsize=(15,10))
tree.plot_tree(model, filled=True, feature_names=feature_cols, class_names=target_names)
plt.show()
To run this notebook, install the following dependencies:
pip install pandas numpy scikit-learn matplotlib graphviz
- Open the notebook:
DECISION_TREE.ipynb
- Run each cell sequentially.
- The notebook will display training accuracy, decision tree plots, and performance metrics.
- The trained model achieves a good accuracy depending on dataset complexity.
- Visual inspection of the decision tree reveals key decision rules.
- Demonstrates interpretability and simplicity of Decision Tree models.
Created by: Sujay Roy
Year: 2025
Project Type: Machine Learning β Decision Tree Implementation