# Decision Tree Classifier Example (Iris Dataset)
Here is demonstrated how to use the DecisionTree module from the rice_ml library to classify data.\
 In this example, the Iris dataset will be used to train, test, and evaluate the model.\
**Goal: Predict the species of Iris flower based on its four measurements.**

##### 1. Setup and Data Loading
Import necessary modules and load the Iris dataset from scikit-learn.\
The Iris dataset is a small classification dataset that has:
- Samples: 150
- Features: 4 features measurements (sepal length, sepal width, petal length, petal width)
- Classes: 3 types of Iris flowers (Setosa, Versicolor, Virginica)

In [1]:
import numpy as np
from sklearn.datasets import load_iris

from rice_ml.supervised_learning.decision_trees import DecisionTreeClassifier
from rice_ml.processing.preprocessing import train_test_split
from rice_ml.processing.post_processing import confusion_matrix, accuracy_score

# Loading data
iris = load_iris()
X = iris.data
y = iris.target
target_names = iris.target_names

print(f"Dataset loaded: {X.shape[0]} samples, {X.shape[1]} features.")
print(f"Classes: {target_names}")

Dataset loaded: 150 samples, 4 features.
Classes: ['setosa' 'versicolor' 'virginica']


**Data Pre-Processing: Splitting the Dataset**\
Before training the model, separate the data into two distinct groups to ensure we can evaluate the model fairly: a **training set** (for fitting the model) and a **test set** (for evaluating performance on unseen data). We use the custom train_test_split function for this step.

In [2]:
# For this example, we will use 20% for testing (reproducible split)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=67)
print(f"\nTraining Samples: {X_train.shape[0]}")
print(f"Testing Samples: {X_test.shape[0]}")


Training Samples: 120
Testing Samples: 30


2. Initialize and Train the Model
- Create an instance of DecisionTreeClassifier and fit it to the training data.

In [3]:
# 1. Initialize the Decision Tree Classifier
# Set hyperparameters like max_depth and min_samples_split for control.
dtc = DecisionTreeClassifier(
    max_depth=3, 
    min_samples_split=5, 
    random_state = 67
)

print("\nBeginning Decision Tree Training...")

# 2. Fit the model to the training data (X_train, y_train)
dtc.fit(X_train, y_train)

print("Training Complete. Model parameters have been learned.")


Beginning Decision Tree Training...
Training Complete. Model parameters have been learned.


3. Prediction and Evaluation
- Use the trained model to predict outcomes for the unseen test data and assess accuracy and confusion.

In [4]:
# 1. Generate predictions on the held-out test set
y_pred = dtc.predict(X_test)

# 2. Calculate the Accuracy Score
accuracy = accuracy_score(y_test, y_pred)
print(f"\nModel Accuracy on Test Set: {accuracy:.4f}")

# 3. Compute the Confusion Matrix
conf_matrix = confusion_matrix(y_test, y_pred)
print("\nConfusion Matrix (True vs. Predicted):")
print(conf_matrix)

# The confusion matrix helps visualize where the model made correct 
# (diagonal elements) and incorrect predictions (off-diagonal elements).


Model Accuracy on Test Set: 1.0000

Confusion Matrix (True vs. Predicted):
[[11  0  0]
 [ 0  6  0]
 [ 0  0 13]]
