# DecisionTreeClassifier

### Load & Explore IRIS Dataset

``` python
python
CopyEdit
from sklearn.datasets import load_iris
import pandas as pd

# Load iris
iris = load_iris()
X = pd.DataFrame(iris.data, columns=iris.feature_names)
y = pd.Series(iris.target)
```

------------------------------------------------------------------------

### Train-Test Split

``` python
python
CopyEdit
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
```

------------------------------------------------------------------------

### Train the Decision Tree

``` python
python
CopyEdit
from sklearn.tree import DecisionTreeClassifier

dt = DecisionTreeClassifier(max_depth=3, random_state=42)
dt.fit(X_train, y_train)
```

------------------------------------------------------------------------

### Evaluate It

``` python
python
CopyEdit
from sklearn.metrics import classification_report, accuracy_score, confusion_matrix

y_pred = dt.predict(X_test)
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))
print("Accuracy:", accuracy_score(y_test, y_pred))
```

------------------------------------------------------------------------

### Visualize the Tree

``` python
python
CopyEdit
import matplotlib.pyplot as plt
from sklearn.tree import plot_tree

plt.figure(figsize=(12, 8))
plot_tree(dt, filled=True, feature_names=iris.feature_names, class_names=iris.target_names)
plt.show()
```

------------------------------------------------------------------------

### Save Model

``` python
python
CopyEdit
import joblib
joblib.dump(dt, "dt_iris.pkl")
```

### Decision Tree

### 🔹 What is a Decision Tree?

A **Decision Tree** is a supervised ML algorithm used for
**classification** and **regression**.

It mimics human decision-making using a tree-like model of decisions.

👉 Each internal node: a feature (condition)

👉 Each branch: outcome of the condition

👉 Each leaf: final decision/class

> Think of it like playing 20 Questions with your data.

------------------------------------------------------------------------

### 🔹 Real-World Analogy:

Imagine you’re picking a fruit:

-   Is it red?

    → Yes → Is it round?

    → Yes → Apple

    → No → Strawberry

-   Is it yellow?

    → Banana!

Simple if-else checks! That’s a decision tree.

------------------------------------------------------------------------

### 🔹 Types of Splitting Criteria (How it Decides the Best Question):

When building a tree, it chooses questions that give the **purest
split** using:

### 1. **Gini Impurity**

-   detailed

    \### What it means:

    Measures how **impure** a node is. Lower Gini = purer node = better
    split!

    > Formula:

    Gini = 1 - p_i^2

    \]

    Where pip_ipi is the probability of class **i** at the node.

    \### Example:

    If a node has 50% Class A and 50% Class B:

    Gini=1−(0.52+0.52)=0.5Gini = 1 - (0.5^2 + 0.5^2) = 0.5

    Gini=1−(0.52+0.52)=0.5

    If a node has 100% Class A:

    Gini=1−12=0(pure!)Gini = 1 - 1^2 = 0 (pure!)

    Gini=1−12=0(pure!)

    \### Use:

    -   Fast to compute
    -   Default in **sklearn**

-   Measures impurity of a node.

-   Range: 0 (pure) to max of \~0.5 (for binary).

### 2. **Entropy (Information Gain)**

-   detailed

    \### Entropy:

    Measures **uncertainty** or **disorder**.

    \### Information Gain:

    Difference in entropy **before** and **after** the split.

    IG=Entropy(parent)−Weighted Avg Entropy of children

    \### Example:

    If a node is mixed (like 60% yes, 40% no), entropy is high.

    If it’s all yes or all no → entropy is zero.

    \### Use:

    -   More **theoretically grounded** (from Information Theory)
    -   Slightly slower than Gini
    -   Good for **feature selection**

-   Think of it as disorder

-   Info Gain = Entropy(Parent) - Weighted Entropy(Children)

-   More computationally intense than Gini.

**so inoformation gain other name is entropy**

No, Information Gain and Entropy are not the same thing, but they are
very closely related concepts in the context of Decision Trees.

Here’s the distinction:

-   **Entropy (H):**
    -   **What it is:** Entropy is a **measure of impurity, disorder, or
        uncertainty** within a set of data.
    -   **How it works:** In a decision tree, for a given node, entropy
        quantifies how mixed the class labels are.
        -   If all data points in a node belong to the same class
            (perfectly pure), the entropy is 0.
        -   If the data points are evenly distributed across multiple
            classes (maximum impurity/uncertainty), the entropy is high
            (e.g., 1 for a binary classification problem with 50/50
            split).
    -   **Purpose:** It tells you how “messy” a node is before and after
        a potential split.
-   **Information Gain (IG):**
    -   **What it is:** Information Gain is the **reduction in entropy**
        achieved by splitting the data on a particular attribute
        (feature). It measures how much “information” a feature provides
        about the class labels.
    -   **How it works:** You calculate the entropy of the parent node
        (Hparent) and then the weighted average entropy of the child
        nodes after the split (Hchildren). The Information Gain is the
        difference: Where:
        -   S is the parent set of examples.
        -   A is the attribute being split on.
        -   Values(A) are the possible values of attribute A.
        -   Sv is the subset of S for which attribute A has value v.
        -   ∣S∣∣Sv∣ is the proportion of examples in S that have value v
            for attribute A.
    -   **Purpose:** The decision tree algorithm uses Information Gain
        to select the best feature to split on at each step. It chooses
        the feature that provides the **highest Information Gain**,
        meaning it most effectively reduces the overall impurity of the
        dataset.

**Analogy:**

Imagine you have a bag of marbles, some red and some blue.

-   **Entropy** is how mixed up the colors are in the bag. If you have
    equal numbers of red and blue, the entropy is high (very mixed). If
    you only have red marbles, the entropy is low (not mixed at all).
-   **Information Gain** is the benefit you get from sorting the
    marbles. If you have a way to pick up a handful of marbles and
    suddenly they are all red, you’ve gained a lot of “information”
    about their color, and the “impurity” of that handful has
    drastically reduced. The process of splitting the data by a feature
    is like finding a way to sort the marbles and reduce their
    mixed-up-ness.

🧠 **TL;DR**: Both measure “purity”. Gini is faster; Entropy is more
precise.

**which to use which**

------------------------------------------------------------------------

### 🥇 **Gini Impurity**

✅ **Faster to compute** (no logarithms!)

✅ Works great in practice

✅ Default in `sklearn`’s `DecisionTreeClassifier`

❌ Doesn’t have the theoretical “purity” of entropy

❌ Slightly less sensitive to class imbalance

------------------------------------------------------------------------

### 🥈 **Information Gain (Entropy)**

✅ Based on **Information Theory**

✅ More **interpretable** for humans

✅ Slightly better for **imbalanced datasets** or when split quality
needs precision

❌ Slower (due to logs)

❌ Can favor attributes with **many categories** (unless you use Gain
Ratio)

------------------------------------------------------------------------

### ⚔️ So… **Which to use?**

| Situation                         | Recommendation                   |
|-----------------------------------|----------------------------------|
| You’re using `sklearn`            | Stick with **Gini** (default) ✅ |
| You want theory-based purity      | Try **Information Gain** 🧠      |
| Data is **highly imbalanced**     | IG might edge out Gini 💥        |
| You’re aiming for **speed**       | Gini is faster 🏃‍♂️                |
| You want to impress in interviews | Know **both**! 😎🎓              |

------------------------------------------------------------------------

### 🎯 Final Verdict (Practical Advice):

> Use Gini by default (fast, solid, works well).
>
> Try **Information Gain** if your tree seems weird, you’re dealing with
> **imbalanced data**, or you’re just curious to compare!

------------------------------------------------------------------------

### 🔹 DecisionTreeClassifier in `sklearn`

``` python
python
CopyEdit
from sklearn.tree import DecisionTreeClassifier

model = DecisionTreeClassifier(criterion='gini', max_depth=3, random_state=42)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
```

------------------------------------------------------------------------

### 🔹 Visualizing the Tree

``` python
python
CopyEdit
from sklearn.tree import plot_tree
import matplotlib.pyplot as plt

plt.figure(figsize=(12,6))
plot_tree(model, feature_names=X.columns, class_names=['No','Yes'], filled=True)
plt.show()
```

Or export it nicely:

``` python
python
CopyEdit
from sklearn.tree import export_text
print(export_text(model, feature_names=list(X.columns)))
```

------------------------------------------------------------------------

### 🔹 Important Parameters

| Parameter           | Description                                               |
|------------------------------------|------------------------------------|
| `criterion`         | ‘gini’ (default) or ‘entropy’                             |
| `max_depth`         | Max levels in the tree                                    |
| `min_samples_split` | Min samples needed to split                               |
| `min_samples_leaf`  | Min samples in a leaf                                     |
| `max_features`      | How many features to consider when looking for best split |
| `random_state`      | Reproducibility                                           |

------------------------------------------------------------------------

### 🔹 Pros vs Cons

✅ **Pros**:

-   Easy to understand/visualize
-   No need for feature scaling
-   Handles both numeric and categorical data

❌ **Cons**:

-   Prone to overfitting (deep trees)
-   Sensitive to data noise
-   Less accurate than ensemble methods (like Random Forest)

# Applications of Decision Trees

-   **Loan Approval in Banking**: A bank needs to decide whether to
    approve a loan application based on customer profiles.
    -   Input features include income, credit score, employment status,
        and loan history.
    -   The decision tree predicts loan approval or rejection, helping
        the bank make quick and reliable decisions.
-   **Medical Diagnosis:** A healthcare provider wants to predict
    whether a patient has diabetes based on clinical test results.
    -   Features like glucose levels, BMI, and blood pressure are used
        to make a decision tree.
    -   Tree classifies patients into diabetic or non-diabetic,
        assisting doctors in diagnosis.
-   **Predicting Exam Results in Education : S**chool wants to predict
    whether a student will pass or fail based on study habits.
    -   Data includes attendance, time spent studying, and previous
        grades.
    -   The decision tree identifies at-risk students, allowing teachers
        to provide additional support.

------------------------------------------------------------------------

### 🔹 Mini Project Ideas

-   🍷 Classify wine quality using `sklearn.datasets.load_wine()`
-   🧬 Predict cancer from `load_breast_cancer()`
-   🚢 Survival prediction with Titanic dataset

------------------------------------------------------------------------

### 🔹 Common Mistakes to Avoid

-   Not setting `max_depth` → overfitting alert 🚨
-   Using unscaled data? No worries, DT doesn’t care 😎
-   Confusing **Gini** and **Entropy**—they’re *friends*, not foes.

------------------------------------------------------------------------

## 🛡️ Techniques to Prevent Overfitting in Decision Trees

Let’s **chop the tree wisely** 🌳✂️

------------------------------------------------------------------------

### 1. **`max_depth`**

> Limit how deep the tree can grow.

``` python
python
CopyEdit
from sklearn.tree import DecisionTreeClassifier
model = DecisionTreeClassifier(max_depth=4)
```

🪚 Stops tree from splitting forever.

------------------------------------------------------------------------

### 2. **`min_samples_split`**

> Minimum samples needed to split a node.

``` python
python
CopyEdit
DecisionTreeClassifier(min_samples_split=10)
```

📉 Bigger number = fewer splits = simpler tree.

------------------------------------------------------------------------

### 3. **`min_samples_leaf`**

> Minimum samples required in a leaf node.

``` python
python
CopyEdit
DecisionTreeClassifier(min_samples_leaf=5)
```

🌿 Prevents tiny, overfitted leaves with 1 or 2 rows.

------------------------------------------------------------------------

### 4. **`max_leaf_nodes`**

> Restrict number of leaf nodes (final decisions)

``` python
python
CopyEdit
DecisionTreeClassifier(max_leaf_nodes=15)
```

🌲 Keeps it compact.

------------------------------------------------------------------------

### 5. **`max_features`**

> Randomly select only some features per split.

``` python
python
CopyEdit
DecisionTreeClassifier(max_features="sqrt")
```

🌈 Adds randomness → good for ensembles too.

------------------------------------------------------------------------

### 6. **Pruning** (Post-training trimming)

In `sklearn`, use **Cost Complexity Pruning**:

``` python
python
CopyEdit
DecisionTreeClassifier(ccp_alpha=0.01)
```

✂️ Removes branches that don’t improve performance.

------------------------------------------------------------------------

### 7. **Cross-validation**

> Tune hyperparameters using GridSearchCV or RandomizedSearchCV to find
> sweet spot 🎯

------------------------------------------------------------------------

### 🔮 BONUS: Use Random Forests 🌳🌳🌳

Random Forest = **lots of shallow, random trees**

✅ Way more resistant to overfitting than a single deep tree.

------------------------------------------------------------------------

## 🧠 TL;DR Cheat Sheet:

| Parameter           | What it controls | Helps with     |
|---------------------|------------------|----------------|
| `max_depth`         | Tree depth       | Complexity     |
| `min_samples_split` | When to split    | Over-splitting |
| `min_samples_leaf`  | Small leaves     | Noise          |
| `ccp_alpha`         | Post-pruning     | Tiny tweaks    |
| `max_leaf_nodes`    | Limits decisions | Simplicity     |