## Decision Tree


#### What is a Decision Tree?
A Decision Tree is a supervised machine learning algorithm that makes decisions by splitting data step by step, similar to how humans make decisions using rules.

It looks like an upside-down tree:

Root ‚Üí first decision

Branches ‚Üí conditions

Leaves ‚Üí final answer (prediction)

Imagine deciding whether to play outside:

                 Is it raining?
                 /            \
              Yes              No
              |                |
        Stay inside         Is it hot?
                              /      \
                           Yes        No
                           |          |
                     Go swimming    Go play

This is exactly how a decision tree works.

**Where are Decision Trees Used?**

Medical diagnosis (disease / no disease)

Loan approval (approve / reject)

Customer churn prediction

Spam detection

Fraud detection

Regression problems (predict numbers like price)

**Types of Decision Trees**
**1. Classification Tree**

Output: category

Examples:

Spam / Not Spam

Yes / No

Fraud / Not Fraud

**2. Regression Tree**

Output: number

Examples:

House price

Salary

Temperature

**Basic Terminology:**

| Term          | Meaning                       |
| ------------- | ----------------------------- |
| Root Node     | First split                   |
| Internal Node | Decision node                 |
| Leaf Node     | Final output                  |
| Branch        | Outcome of a decision         |
| Depth         | Levels of the tree            |
| Split         | Condition used to divide data |


**How Does a Decision Tree Work?**

Core idea:

At each step, choose the best feature and best split that separates data most clearly.

This is done using impurity measures.

**Impurity ‚Äì What Does It Mean?**


Impurity = how mixed the classes are in a node.

Pure node ‚Üí only one class

Impure node ‚Üí mixed classes



#### **Impurity Measures (Classification):**

   **1. Gini Index (Most Common)**

Formula:

Gini=1‚àí‚àëp
i
2

Where:
pi = probability of class i

Example:

Data: 10 samples

6 Yes

4 No

Gini=1‚àí(0.6^2+0.4^2)=0.48

Lower Gini ‚Üí better split.



**2. Entropy**

Formula:

ùê∏
ùëõ
ùë°
ùëü
ùëú
ùëù
ùë¶=
‚àí
‚àë
ùëù
ùëñ
log
‚Å°
2
(
ùëù
ùëñ
)


Entropy measures uncertainty.

Entropy = 0 ‚Üí pure

Entropy = 1 ‚Üí highly mixed

**3. Information Gain**

Information Gain =
Entropy before split ‚àí Entropy after split

Tree selects split with maximum information gain.

#### **How Does the Tree Choose a Split?**

For each feature:

Try all possible split values

Calculate impurity

Choose split with lowest impurity

This is done greedily (locally best decision).

#### **Decision Tree Algorithm (High Level):**

Start with full dataset

Calculate impurity

Split on best feature

Repeat recursively

Stop when conditions met

#### **Advantages of Decision Trees**

Easy to understand

No feature scaling needed

Handles categorical + numerical data

Works with non-linear data



#### **Disadvantages of Decision Trees**

Overfitting

Sensitive to small data changes

Greedy algorithm (not globally optimal)

#### **Overfitting in Decision Trees**
What is overfitting?

Model learns noise instead of pattern.

Very deep tree

100% training accuracy

Poor test accuracy

#### **How to Control Overfitting (Pruning)**
**1. Pre-Pruning (Early Stopping)**

Limit tree growth:

max_depth

min_samples_split

min_samples_leaf

max_features

**2. Post-Pruning**

Grow full tree ‚Üí cut unnecessary branches
(Used more in theory than practice)

#### **Why Train‚ÄìTest Split Is Not Enough**
Problem with single train‚Äìtest split

When we do this:

train_test_split(X, y, test_size=0.2)


The result depends heavily on how the data is split.

One split may give high accuracy

Another split may give low accuracy

Model performance becomes unstable


#### **K-Fold Cross Validation**
**What is K-Fold Cross Validation?**

Instead of training once, we:

Split data into K equal parts (folds)

Train the model K times

**Each time:**

Use K‚àí1 folds for training

Use 1 fold for validation

Take the average performance

In [1]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import cross_val_score
from sklearn.datasets import load_iris


In [2]:
data = load_iris()
X = data.data
y = data.target



In [3]:
model = DecisionTreeClassifier(
    criterion="gini",
    max_depth=3,
    random_state=42
)


In [4]:
scores = cross_val_score(
    model,
    X,
    y,
    cv=5,        # 5-Fold Cross Validation
    scoring="accuracy"
)


In [5]:
print("Accuracy for each fold:", scores)    #scores ‚Üí accuracy of each fold
print("Mean Accuracy:", scores.mean())       #scores.mean() ‚Üí final reliable accuracy


Accuracy for each fold: [0.96666667 0.96666667 0.93333333 1.         1.        ]
Mean Accuracy: 0.9733333333333334


#### Important Hyperparameters (Very Important)

| Parameter         | Meaning               |
| ----------------- | --------------------- |
| max_depth         | Maximum depth of tree |
| min_samples_split | Min samples to split  |
| min_samples_leaf  | Min samples in leaf   |
| max_features      | Features considered   |
| criterion         | gini / entropy        |


#### **Why tuning is needed?**

Bad hyperparameters cause:

Underfitting (tree too shallow)

Overfitting (tree too deep)

Goal:

Find the best combination that generalizes well.

#### **Hyperparameter Tuning Using GridSearchCV**

**What is GridSearchCV?**

GridSearchCV:

Tries all combinations of given hyperparameters

Uses cross-validation internally

Returns the best model

#### **GridSearchCV = Hyperparameter tuning + K-Fold Cross Validation**

In [6]:
from sklearn.model_selection import GridSearchCV
from sklearn.tree import DecisionTreeClassifier


In [8]:
param_grid = {
    "criterion": ["gini", "entropy"],
    "max_depth": [2, 3, 4, 5, None],
    "min_samples_split": [2, 5, 10],
    "min_samples_leaf": [1, 2, 4]
}


In [9]:
dt = DecisionTreeClassifier(random_state=42)


In [10]:
grid = GridSearchCV(
    estimator=dt,
    param_grid=param_grid,
    cv=5,                # 5-Fold CV
    scoring="accuracy",
    n_jobs=-1            # Use all CPU cores
)


In [11]:
grid.fit(X, y)


In [12]:
print("Best Parameters:", grid.best_params_)
print("Best CV Accuracy:", grid.best_score_)


Best Parameters: {'criterion': 'gini', 'max_depth': 3, 'min_samples_leaf': 1, 'min_samples_split': 2}
Best CV Accuracy: 0.9733333333333334


In [15]:
best_model = grid.best_estimator_
best_model